A System Identification of Diabetes Based on Ensemble Method: Bagging, Random Forest, and Extreme Gradient Boosting

Authors

  • Jonas de Deus Guterres Universitas Negeri Yogyakarta, Indonesia
  • Fatchul Arifin Universitas Negeri Yogyakarta, Indonesia

DOI:

https://doi.org/10.21831/elinvo.v10i2.89649

Keywords:

Diabetes Classification, Ensemble Learning, Random Forest, XGBoost, Bagging

Abstract

Diabetes is a prevalent chronic illness that is recognized worldwide, with an estimated prevalence in adults ranging from 42% to 170% globally. To reduce the likelihood of developing diabetes, it is vital for individuals at an increased risk to understand the importance of embracing healthy lifestyles and managing their consumption of foods that can potentially raise insulin levels in the body. Therefore, it is crucial to detect early pre-symptoms to minimize the incidence of individuals being afflicted by this condition without their awareness. Machine learning has emerged as a contemporary tool that aids in the prediction of various diseases, including diabetes, by analyzing patient data. Despite numerous research attempts using various machine learning techniques, achieving high accuracy in predicting diabetes has remained challenging. Therefore, this study implemented an ensemble approach that combined bagging, random forest, and Extreme Gradient Boost (XGBoost) algorithms to enhance the predictive performance for diabetes. This approach involved evaluating selected features based on their highest correlation and incorporating all available features in the analysis. Based on the results, the bagging technique demonstrated the highest accuracy of 0.83 in predicting model 6. Following closely behind was the random forest algorithm, which achieved an accuracy of 0.82, and XGBoost with an accuracy of 0.81.

References

[1] Ahmad, Shamim I., ed. Diabetes: an old disease, a new insight. Vol. 771. Springer Science & Business Media, 2013. https://link.springer.com/book/10.1007/978-1-4614-5441-0

[2] Heydari, Iraj, Vida Radi, Sara Razmjou, and Afsaneh Amiri. "Chronic complications of diabetes mellitus in newly diagnosed patients." International Journal of Diabetes Mellitus 2, no. 1 (2010): 61-63, doi:10.1016/j.ijdm.2009.08.001

[3] Díaz, César A. González, Christian Chapa González, Eric Laciar Leber, Hugo A. Vélez, Norma P. Puente, Dora-Luz Flores, Adriano O. Andrade et al., eds. VIII Latin American Conference on Biomedical Engineering and XLII National Conference on Biomedical Engineering: Proceedings of CLAIB-CNIB 2019, October 2-5, 2019, Cancún, México. Vol. 75. Springer Nature, 2019.

[4] Al-Rubeaan, Khalid. "Type 2 diabetes mellitus red zone." International Journal of Diabetes Mellitus 1, no. 2 (2010): 1-2, doi: 10.1016/j.ijdm.2009.12.009

[5] Hidayat, Budi, Royasia Viki Ramadani, Achmad Rudijanto, Pradana Soewondo, Ketut Suastika, and Junice Yi Siu Ng. "Direct Medical Cost of Type 2 Diabetes Mellitus and Its Associated Complications in Indonesia." Value in Health Regional Issues 28 (2022): 82-89, doi: 10.1016/j.vhri.2021.04.006

[6] Anonim, Tri, Afiyah Sri Harnany, and Supriyo Supriyo. "PENGARUH LDL DAN DIABETES MELITUS TERHADAP KEJADIAN PREEKLAMPSI KEHAMILAN." Jurnal Lintas Keperawatan 1, no. 1 (2020), doi: 10.31983/jlk.v1i1.6453

[7] Maheswara, Ayya, Razmaeda Sarastry, Herman Kristanto, Julian Dewantiningrum, and Putri Sekar Wiyati. "Maternal and Perinatal Outcomes in Pregnancy Complicated with Pre-and Gestational Diabetes Mellitus." Indonesian Journal of Obstetrics and Gynecology (2023): 11-12, doi: 10.32771/inajog.v11i1.1655

[8] Laila, Umm E., Khalid Mahboob, Abdul Wahid Khan, Faheem Khan, and Whangbo Taekeun. "An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study." Sensors 22, no. 14 (2022): 5247, doi: 10.3390/s22145247

[9] Chatrati, Saiteja Prasad, Gahangir Hossain, Ayush Goyal, Anupama Bhan, Sayantan Bhattacharya, Devottam Gaurav, and Sanju Mishra Tiwari. "Smart home health monitoring system for predicting type 2 diabetes and hypertension." Journal of King Saud University-Computer and Information Sciences 34, no. 3 (2022): 862-870, doi: 10.1016/j.jksuci.2020.01.010

[10] Kumari, Saloni, Deepika Kumar, and Mamta Mittal. "An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier." International Journal of Cognitive Computing in Engineering 2 (2021): 40-46, doi: 10.1016/j.ijcce.2021.01.001

[11] Saxena, Roshi, Sanjay Kumar Sharma, Manali Gupta, and G. C. Sampada. "A novel approach for feature selection and classification of diabetes mellitus: Machine learning methods." Computational Intelligence and Neuroscience 2022 (2022), doi: 10.1155/2022/3820360

[12] Febrian, Muhammad Exell, Fransiskus Xaverius Ferdinan, Gustian Paul Sendani, Kristien Margi Suryanigrum, and Rezki Yunanda. "Diabetes prediction using supervised machine learning." Procedia Computer Science 216 (2023): 21-30, doi: 10.1016/j.procs.2022.12.107

[13] Yadav, Dhyan Chandra, and Saurabh Pal. "An experimental study of diversity of diabetes disease features by bagging and boosting ensemble method with rule-based machine learning classifier algorithms." SN Computer Science 2, no. 1 (2021): 50, doi: 10.1007/978-1-4614-3335-4_25

[14] Hasan, Md Kamrul, Md Ashraful Alam, Dola Das, Eklas Hossain, and Mahmudul Hasan. "Diabetes prediction using ensembling of different machine learning classifiers." IEEE Access 8 (2020): 76516-76531, doi: 10.1109/ACCESS.2020.2989857

[15] Raghavendran, Ch V., G. Naga Satish, N. S. L. Kumar Kurumeti, and Shaik Mahaboob Basha. "An Analysis on Classification Models to Predict Possibility for Type 2 Diabetes of a Patient." In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2021, pp. 181-196. Singapore: Springer Nature Singapore, 2022, doi: 10.1007/978-981-16-7167-8_14

[16] Pham, K., Kim, D., Park, S., & Choi, H. (2021). Ensemble learning-based classification models for slope stability analysis. Catena, 196, 104886, https://doi.org/10.1016/j.catena.2020.104886

[17] Motwani, Anand, Goldi Bajaj, and Sushila Mohane. "Predictive modelling for credit risk detection using ensemble method." Int. J. Comput. Sci. Eng 6, no. 6 (2018): 863-867, doi: 10.26438/ijcse/v6i6.863867

[18] Kasarda, Radovan, Nina Moravčíková, Gábor Mészáros, Mojca Simčič, and Daniel Zaborski. "Classification of cattle breeds based on the random forest approach." Livestock Science 267 (2023): 105143, doi: 10.1016/j.livsci.2022.105143

[19] Anyanwu, G. O., Nwakanma, C. I., Lee, J. M., & Kim, D. S. (2023). Novel hyper-tuned ensemble random forest algorithm for the detection of false basic safety messages in internet of vehicles. ICT Express, 9(1), 122-129, https://doi.org/10.1016/j.icte.2022.06.003

[20] Shaikhina, Torgyn, Dave Lowe, Sunil Daga, David Briggs, Robert Higgins, and Natasha Khovanova. "Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation." Biomedical Signal Processing and Control 52 (2019): 456-462, doi: 10.1016/j.bspc.2017.01.012

[21] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794), https://doi.org/10.1145/2939672.2939785

[22] Dimitrakopoulos, Georgios N., Aristidis G. Vrahatis, Vassilis Plagianakos, and Kyriakos Sgarbas. "Pathway analysis using XGBoost classification in Biomedical Data." In Proceedings of the 10th Hellenic conference on artificial intelligence, pp. 1-6. 2018, doi: 10.1145/3200947.3201029

[23] Wang, Tingting, Yinju Bian, Yixiao Zhang, and Xiaolin Hou. "Classification of earthquakes, explosions and mining-induced earthquakes based on XGBoost algorithm." Computers & Geosciences 170 (2023): 105242, doi: 10.1016/j.cageo.2022.105242

[24] Kiangala, S. K., & Wang, Z. (2021). An effective adaptive customization framework for small manufacturing plants using extreme gradient boosting-XGBoost and random forest ensemble learning algorithms in an Industry 4.0 environment. Machine Learning with Applications, 4, 100024, https://doi.org/10.1016/j.mlwa.2021.100024

[25] Sawarn, Aman, and Monika Gupta. "Comparative analysis of bagging and boosting algorithms for sentiment analysis." Procedia Computer Science 173 (2020): 210-215, doi: 10.1016/j.procs.2020.06.025

[26] Ju, B. S., Kwag, S., & Lee, S. (2023). Performance-based drift prediction of reinforced concrete shear wall using bagging ensemble method. Nuclear Engineering and Technology https://doi.org/10.1016/j.net.2023.05.008

Downloads

Published

2025-12-04

How to Cite

de Deus Guterres, J., & Arifin, F. (2025). A System Identification of Diabetes Based on Ensemble Method: Bagging, Random Forest, and Extreme Gradient Boosting. Elinvo (Electronics, Informatics, and Vocational Education), 10(2), 205–215. https://doi.org/10.21831/elinvo.v10i2.89649

Citation Check