Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm

Authors

  • Maad M. Mijwil Baghdad College of Economic Sciences University, Baghdad, Iraq http://orcid.org/0000-0002-2884-2504
  • Rana A. Abttan Baghdad College of Economic Sciences University, Baghdad, Iraq

DOI:

https://doi.org/10.24203/ajas.v9i1.6503

Keywords:

Genetic algorithm, C4.5 Decision tree, Optimizing, Pruning, Machine learning

Abstract

A decision tree (DTs) is one of the most popular machine learning algorithms that divide data repeatedly to form groups or classes. It is a supervised learning algorithm that can be used on discrete or continuous data for classification or regression. The most traditional classifier in this algorithm is the C4.5 decision tree, which is the point of this research. This classifier has the advantage of building a vast data set and does not stop until it reaches the desired goal. The problem with this classifier is that there are unnecessary nodes and branches leading to overfitting. This overfitting can negatively affect the classification process. In this context, the authors suggest utilizing a genetic algorithm to prune the effect of overfitting. This dataset study consists of four datasets: IRIS, Car Evaluation, GLASS, and WINE collected from UC Irvine (UCI) machine learning repository. The experimental results have confirmed the effectiveness of the genetic algorithm in pruning the effect of overfitting on the four datasets and optimizing confidence factor (CF) of the C4.5 decision tree. The proposed method has reached about 92% accuracy in this work.

Author Biography

Maad M. Mijwil, Baghdad College of Economic Sciences University, Baghdad, Iraq

Maad M. Mijwil received B.Sc. degree in Software Engineering from Software Engineering Department at Baghdad College of Economics Sciences University, Iraq in 2008/2009 and M.Sc. degree in Wireless sensor network of computer science from University of Baghdad, Iraq in 2015. Currently he is working Assistant Lecturer at Baghdad College of Economics Sciences University.

References

Brohi S. N., Pillai T. R., Kaur S., Kaur H., Sukumaran S., and Asirvatham D., “Accuracy Comparison of Machine Learning Algorithms for Predictive Analytics in Higher Education,” In Proceedings of International Conference on Emerging Technologies in Computing (iCETiC 2019)- Springer, pp: 254-261, London, United Kingdom, 19-20 August 2019. https://doi.org/10.1007/978-3-030-23943-5_19

Sejnowski T. J., “The unreasonable effectiveness of deep learning in artificial intelligence,” Proceedings of the National Academy of Sciences of the United States of America, vol.117, no.48, pp: 30033–30038, December 2020. https://doi.org/10.1073/pnas.1907373117

Zorins A. and Grabusts P., “Artificial Neural Networks and Human Brain: Survey of Improvement Possibilities of Learning,” In Proceedings of the 10th International Scientific and Practical Conference, pp:228-231, Rēzekne, Latvia, 2015, http://dx.doi.org/10.17770/etr2015vol3.165

Pranckevičius T. and Marcinkevičius V., “Comparison of Naïve Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Baltic Journal of Modern Computing, vol.5, no.2, pp:221-232, January 2017. http://dx.doi.org/10.22364/bjmc.2017.5.2.05

Holzinger A., “Introduction to Machine Learning & Knowledge Extraction (MAKE),” Machine Learning and Knowledge Extraction- MDPI, vol.1, no.1, pp:1-20, https://doi.org/10.3390/make1010001

Kersting K., “Machine Learning and Artificial Intelligence: Two Fellow Travelers on the Quest for Intelligent Behavior in Machines,” Frontiers in Big Data, Vol.1, Article 6, pp:1-4, November 2018, https://doi.org/10.3389/fdata.2018.00006

Tanuka M., “A Beginners Approach to Machine Learning Algorithms,” August 2018, Article link: https://tanukamandal.com/2018/08/16/beginners-approach-to-machine-learning-algorithms/

Fu Z., Golden B. L., Lele S., Raghavan S., Wasil E. A., “A Genetic Algorithm-Based Approach for Building Accurate Decision Trees,” INFORMS Journal on Computing, vol.15, no.1, pp:3-22, February 2003. https://doi.org/10.1287/ijoc.15.1.3.15152

Chen J., Wang X., and Zhai J., “Pruning Decision Tree Using Genetic Algorithms,” In Proceedings of International Conference on Artificial Intelligence and Computational Intelligence- IEEE, pp:1-6, Shanghai, China, 7-8 November 2009. https://doi.org/10.1109/AICI.2009.351

Jankowski D. and Jackowski K., “Evolutionary Algorithm for Decision Tree Induction,” In Proceedings of International Conference on Computer Information Systems and Industrial Management (CISIM)-Springer, pp:23-32, Ho Chi Minh City, Vietnam, November 2011. https://doi.org/10.1007/978-3-662-45237-0_4

Khanbabaei M. and Alborzi M., The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring, International Journal of Managing Information Technology, vol. 5, no.4, pp:13-31, November 2013. https://doi.org/10.5121/ijmit.2013.5402

Muslim M. A., Herowati A. J., Sugiharti E., and Prasetiyo B., “Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease,” In Proceedings of International Conference on Mathematics, Science and Education, - Journal of Physics-IOP Publishing, pp:1-9, Sayangan, Indonesia, 18-19 September 2017. https://doi.org/10.1088/1742-6596/983/1/012062

Fisher R. A., “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol.7, no.2, pp:179-188, September 1936. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Forina M., Leardi R., Armanino C., and Lanteri S., “PARVUS: An extendable package of programs for data exploration, classification and correlation,” Journal of chemometrics -Elsevier, Amsterdam, ISBN: 0-444-43012-1, March 1990. https://doi.org/10.1002/cem.1180040210

Bohanec M. and Rajkovic V., “Knowledge acquisition and explanation for multi-attribute decision making,” In Proceedings of International Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988.

Evett I. W. and Spiehler E. J., “Rule Induction in Forensic Science,” Book: Knowledge Based Systems-ACM Digital Library, pp:152–160, January 1989

Sporer Z., “IRIS Species Classification — Machine Learning Model,” Morioh website, June 2020, Article link: https://morioh.com/p/eafb28ccf4e3

Jazuli H., “Using Decision Tree Method for Car Selection Problem,” Medium website, March 2013, Article link: https://medium.com/machine-learning-guy/using-decision-tree-method-for-car-selection-problem-5272675451f9

Hssina B., Merbouha A., Ezzikouri H., and Erritali M., “A comparative study of decision tree ID3 and C4.5,” International Journal of Advanced Computer Science and Applications, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications, pp:13-19, July 2014. https://doi.org/10.14569/SpecialIssue.2014.040203

Özsoy S., Gümüş G., and Khalilov S., “C4.5 Versus Other Decision Trees: A Review,” Computer Engineering and Applications, vol. 4, no. 3, pp:173-181, September 2015.

Tripathi M., “Understanding Decision Trees with Python,” Data science Foundation, May 2020, Article link: https://datascience.foundation/sciencewhitepaper/understanding-decision-trees-with-python

García J. M., Acosta C. A., and Mesa M. J., “Genetic algorithms for mathematical optimization,” Journal of Physics: Conference Series- IOP Publishing, pp:1-5, 2020, https://doi.org/10.1088/1742-6596/1448/1/012020

Sivanandam S., and Deepa S., “Applications of Genetic Algorithms,” Introduction to Genetic Algorithms- Springer, pp:317-402, https://doi.org/10.1007/978-3-540-73190-0_10

Mijwil, M. M. and Abttan, R. A., “Applying Genetic Algorithm to Optimization Second-Order Bandpass MGMFB Filter,” Pertanika Journal of Science and Technology, vol.28, no.4, pp. 1413–1425, October 2020. https://doi.org/10.47836/pjst.28.4.15

Gomez F., Quesada A., and Lopez R., “Genetic Algorithms for Feature Selection,” Neural Designer, Article link: https://www.neuraldesigner.com/blog/genetic_algorithms_for_feature_selection

Downloads

Published

2021-02-26

How to Cite

Mijwil, M. M., & Abttan, R. A. (2021). Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm . Asian Journal of Applied Sciences, 9(1). https://doi.org/10.24203/ajas.v9i1.6503