Predict Diabetes Using Voting Classifier and Hyper Tuning Technique
https://doi.org/10.24017/Science.2022.2.10
Abstract views: 520 / PDF downloads: 234Abstract
Today, diabetes is one of the most common chronic diseases in the world due to the people’s sedentary lifestyle which led to many health issues like heart attack, kidney frailer and blindness. Additionally, most of the people are unrealizable about the early-stage diabetes symptoms to prevent it. The above reasons were encouraging to develop a diabetes prediction system using machine learning techniques. The Pima Indian Diabetes Dataset (PIDD) was utilized for this framework as it is common and appropriate dataset in .CSV format. While there were not any duplicate or null values, however, some zero values were replaced, four outlier records were removed and data standardization were performed in the dataset. In addition, this project methodology divided into two phases of model selection. In the first phase, two different hyper parameter techniques (Randomized Search and TPOT(autoML)) were used to increase the accuracy level for each algorithm. Then six different algorithms (Logistic Regression, Decision Tree, Random Forest, K-nearest neighbor, Support Vector Machine and Naïve Bayes) were applied. In the second phase, the four best performed algorithms (with best estimated parameters for each of them) were chosen and used as an input for the voting classifier, because it applies to find the best algorithm between a group of multiple options. The result was satisfying, and Random Forest was achieved 98.69% in second stage, while its accuracy level was 81.04% in the previous one and it utilized to predict diabetes via a simple graphic user interface.
Keywords:
References
https://doi.org/10.19080/CRDOJ.2019.11.555817
[2] U. Galicia-Garcia, A. Benito-Vicente, Sh. Jebari, A. Larrea-Sebal, H. Siddiqi, K.B. Uribe, H. Ostolaza and C. Martin, "Pathophysiology of type 2 Diabetes Mellitus," International Journal of Molecular Science, vol. 21, issue 17, pp.6275, 2020.
https://doi.org/10.3390/ijms21176275
[3] R. Singla,A. Singla, Y. Gupta and S. Kalra,"Artificial Intelligence/Machine Learning in Diabetes Care," Indian Journal of Endocrinology and Metabolism, vol. 23, issue 4, pp. 495-497, 2019.
https://doi.org/10.4103/ijem.IJEM_228_19
[4] M. Makroum, M. Adda, A. Bouzouane and H. Ibrahim,"Machine Learning and Smart Devices for Diabetes Management: Systematic Review,"Sensors, vol. 22, issue 5, pp.1843, 2022.
https://doi.org/10.3390/s22051843
[5] Y. Jian, M. Pasquier, A. Sagahyroon and F. Aloul," A Machine Learning Approach to Predicting Diabetes Complications," Healthcare, vol. 9, issue 12, pp. 1712, 2021.
https://doi.org/10.3390/healthcare9121712
[6] A. Tuppad and Sh.D. Patil,"Machine learning for diabetes clinical decision support: a review," Advances in Computational Intelligence, vol. 2, issue 22, pp.2022, 2022.
https://doi.org/10.1007/s43674-022-00034-y
[7] L.N. Liyanage,"DIABETES MELLITUS AND ITS RISK FACTORS, Epitome," International Journal of Multidisciplinary Research, vol. 4, issue 9, pp.114 - 119, 2018.
[8] International Diabetes Federation, "Diabetes facts & figures," idf.org, Dec. 9, 2021. [Online]. Available: https://www.idf.org/aboutdiabetes/what-is-diabetes/facts-figures.html[Accessed: Sep. 10, 2022].
[9] X. Lin, Y. Xu, X. Pan, J.Xu, Y. Ding, X. Sun, X. Song, Y. Ren and P. Shan,"Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025," Scientific Report, vol. 10, pp.14790, 2020.
https://doi.org/10.1038/s41598-020-71908-9
[10] World Health Organization, "Diabetes," who.int, Dec. 10, 2022. [Online]. Available: https://www.who.int/health-topics/diabetes#tab=tab_1 [Accessed: Sep. 22, 2022].
[11] J.J. Khanam and S.Y. Foo,"A comparison of machine learning algorithms for diabetes prediction," ICT Express, vol. 7, issue 4, pp. 432-439, 2021.
https://doi.org/10.1016/j.icte.2021.02.004
[12] E. Begic, A. Arnautovic and I. Masic, "ASSESSMENT OF RISK FACTORS FOR DIABETES MELLITUS TYPE 2," Mater Sociomed, vol. 28, issue 3, pp.187-90, 2016.
https://doi.org/10.5455/msm.2016.28.187-190
[13] S. Park, Ch. Kim and X. Wu, "Development and Validation of an Insulin Resistance Predicting Model Using a Machine-Learning Approach in a Population-Based Cohort in Korea," Diagnostics, vol. 12, issue 1, pp.212, 2022.
https://doi.org/10.3390/diagnostics12010212
[14] M.R. Rajput and S.S Khedgikar, "Diabetes prediction and analysis using medical attributes: A Machine learning approach," Journal of Xi'an University of Architecture & Technology, vol. XIV, issue 1, pp. 98-103, 2022.
[15] Sh. Pourbahrami, M. Balafar, L. Khanil and Z.Kakarash, "A survey of neighborhood construction algorithms for clustering and classifying data points," Computer Science Review, vol. 38, pp. 100315, 2020.
https://doi.org/10.1016/j.cosrev.2020.100315
[16] Z. Mushtaq, M.F. Ramzan, S. Ali, S. Baseer, A. Samad and M. Husnain,"Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques," Mobile Information Systems, vol. 2022, pp.6521532, 2022.
https://doi.org/10.1155/2022/6521532
[17] R. Krishnamoorthi, Sh. Joshi, H.Z. Almarzouki, P.K. Shukla, A. Rizwan, C. Kalpana and B. Tiwari,"A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques," Journal of Healthcare Engineering, vol. 2022, pp.1684017, 2022.
https://doi.org/10.1155/2022/1684017
[18] S. R, S. M, M.K. Hasan, R.A. Saeed, S.A. Alsuhibany and S. Abdel-Khalek, "An Empirical Model to Predict the Diabetic Positive Using Stacked Ensemble Approach," Front. Public Health, vol. 9, pp.792124, 2022.
https://doi.org/10.3389/fpubh.2021.792124
[19] B.S. Ahamed, M.S. Arya and A.O. Nancy V, "Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques," Front. Comput. Sci, vol. 4, pp. 835242,2022.
https://doi.org/10.3389/fcomp.2022.835242
[20] F. Bano, M. K and R. MadanaMohana, "Predict Diabetes Mellitus Using Machine Learning Algorithms,"Journal of Physics: Conference Series, vol. 2089, pp.012002, 2021.
https://doi.org/10.1088/1742-6596/2089/1/012002
[21] O. Llaha and A. Rista,"Prediction and Detection of Diabetes using Machine Learning," CEUR Workshop Proceedings, vol. 2872, pp. 94-102, 2021.
[22] A.K. Jaggi, A. Sharma, N. Sharma, R. Singh and P.S. Chakraborty,"Diabetes Prediction Using Machine Learning," Intelligent System, vol. 185, pp. 383-392, 2021.
https://doi.org/10.1007/978-981-33-6081-5_34
[23] N. Ahmed, K. Hamakarim and Z.Kakarash,"A Temporal and Social Network-based Recommender using Graph Clustering," Passer Journal, vol. 4, issue 2, pp. 180-18, 2022.
https://doi.org/10.24271/psr.2022.344758.1134
[24] R. Patra and B. Khuntia,"Analysis and Prediction Of Pima Indian Diabetes Dataset Using SDKNN Classifier Technique," IOP Conference Series: Materials Science and Engineering, vol. 1070, pp. 012059, 2021.
https://doi.org/10.1088/1757-899X/1070/1/012059
[25] E. Elgeldawi, A. Sayed, A. Galal and A. Zaki,"Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis," Informatics, vol. 8, issue 4, pp.79, 2021.
https://doi.org/10.3390/informatics8040079
[26] Y. Zhao, R. Zhang and X. Li,"AutoDESS: AutoML Pipeline Generation of Classification with Dynamic Ensemble Strategy Selection,"arXiv, vol. 2, pp. 2201.00207, 2022.
[27] M. Soni and S.Varma, "Diabetes Prediction using Machine Learning Techniques," International Journal of Engineering Research & Technology (IJERT), vol. 9, issue 9, pp. 921-925, 2022.
[28] B. Hassan and T. Rashid, "A multi-disciplinaryEnsemble Algorithm for Clustering Heterogonous Datasets," Neural Computing and Applications, vol. 33, pp. 10987-11010, 2021.
https://doi.org/10.1007/s00521-020-05649-1
[29] A. Baratloo, M. Hosseini, A. Negida and G. El Ashal, "Part 1: Simple Definition and Calculation of Accuracy," Sensitivity and Specificity, Emergency (Tehran),vol.3, issue 2, pp. 48-9, 2015.
[30] Python, "Download the latest version for Windows," python.org, Dec. 11, 2022. [Online] Available: https://www.python.org/downloads/ [Accessed: March 11, 2022].
[31] A. Dhruv, R. Patel and N. Doshi, "Python:The Most Advanced Programming Languages for Computer Science Application," Science and Technology Publications, Lda, pp.292-299, 2021.
https://doi.org/10.5220/0010307902920299
[32] ANACONDA, "Data Science Technology for a better world," anaconda.org, Dec. 11, 2022. [Online]. Available:https://www.anaconda.com/ [Accessed: March 11, 2022].
[33] Jupyter, "Installing Jupyter," jupyter.org, Dec 11,2022. [Online]. Available:https://jupyter.org/install[Accessed: March 11.2022 ].
[34] B. Randles, I. Pasquetto, M. Golshan and Ch. Borgma, "Using the Jupyter Notebook as a tool for Open Science: An Empirical Study," ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1-2, 2017.
https://doi.org/10.1109/JCDL.2017.7991618
[35] JetBrains, "Download PyCharm," jetbrains.com, Dec. 11,2022. [Online]. Available: https://www.jetbrains.com/pycharm/download/#section=windows[Accessed: March 11, 2022].
[36] ANACONDA, "Installers," anaconda.org, Dec. 11,2022. [Online]. Available: https://anaconda.org/anaconda/django [Accessed: March 11,2022].
[37] B. Hassan, T. Rashid and H. Hamarashid, "A novel cluster detection of COVID-19 patients and medical disease conditions using improved evolutionary clustering algorithm star," Computer in Biology and Medicine, vol. 138, pp. 104866, 2021.
https://doi.org/10.1016/j.compbiomed.2021.104866
Downloads
How to Cite
Article Metrics
Published
Issue
Section
License
Copyright (c) 2023 Chra Ali Kamal, Manal Ali Atiyah
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.