A Two-Stage Hybrid Oversampling and Ensemble Learning Framework for Improved Type 2 Diabetes Mellitus Classification
Downloads
Type 2 Diabetes Mellitus (T2DM) screening using clinical tabular data commonly suffers from class imbalance, where non-diabetic records dominate diabetic cases, causing models to bias toward the majority class and yield poor detection of the positive (diabetic) class. This study aims to improve T2DM classification on an imbalanced dataset by increasing minority-class detection while maintaining acceptable overall performance. The main contribution is a leakage-safe framework that integrates two-stage hybrid oversampling (RandomOverSampler followed by Borderline-SMOTE) and soft-voting ensemble learning to obtain more balanced predictions. Experiments were conducted on the Diabetes Bangladesh (DiaBD) dataset, containing 5,288 clinical records with a binary target, diabetic (Yes/No, mapped to 1/0). The data were stratified into train_full/test splits (80/20) and further into train/validation splits (80/20 of train_full). Features were normalized using MinMaxScaler fitted only on the training set and applied to validation and test sets to prevent data leakage. Class imbalance handling was applied only on the training set using the proposed two-stage oversampling (ROS Borderline-SMOTE; borderline-1, k_neighbors=3). Classification models included SVM (RBF), Random Forest, and Gradient Boosting, as well as soft-voting ensembles of two and three models. Results show that the baseline setting (No OS) can achieve high accuracy but low minority detection; for instance, SVM (No OS) reached an accuracy of 0.9374 with a Recall_pos of 0.0909 and an F1_pos of 0.1587. After oversampling, SVM (OS) improved minority recall to 0.7273 with F1_pos 0.4188, although accuracy decreased to 0.8688 due to increased false positives. The best-balanced performance was achieved by the SVM + RandomForest soft-voting ensemble (OS) with accuracy 0.9125, Recall_pos 0.6545, and the highest F1_pos 0.4932. Overall, the proposed two-stage hybrid oversampling combined with soft-voting ensembles improves T2DM detection on imbalanced tabular data, and the findings highlight that model selection should prioritize Recall_pos and F1_pos rather than accuracy alone.
[1] O. A. Ojo, H. S. Ibrahim, D. E. Rotimi, A. D. Ogunlakin, and A. B. Ojo, “Diabetes mellitus: From molecular mechanism to pathophysiology and pharmacology,” Med. Nov. Technol. Devices, vol. 19, p. 100247, Sep. 2023, doi: 10.1016/j.medntd.2023.100247.
[2] A. Kumar, R. Gangwar, A. Ahmad Zargar, R. Kumar, and A. Sharma, “Prevalence of Diabetes in India: A Review of IDF Diabetes Atlas 10th Edition,” Curr. Diabetes Rev., vol. 20, no. 1, Jan. 2024, doi: 10.2174/1573399819666230413094200.
[3] M. Ortiz-Martínez, M. González-González, A. J. Martagón, V. Hlavinka, R. C. Willson, and M. Rito-Palomares, “Recent Developments in Biomarkers for Diagnosis and Screening of Type 2 Diabetes Mellitus,” Curr. Diab. Rep., vol. 22, no. 3, pp. 95–115, Mar. 2022, doi: 10.1007/s11892-022-01453-4.
[4] B. B. Duncan, C. Stein, and A. Basit, “Edinburgh Research Explorer IDF Diabetes Atlas,” Glob. Reg. country-level diabetes Preval. Estim. 2021 Proj. 2045, 2021.
[5] International Diabetes Federation, “Indonesia – IDF Diabetes Atlas (10th Edition),” Brussels, Belgium.
[6] J. Zhang, Z. Zhang, K. Zhang, X. Ge, R. Sun, and X. Zhai, “Early detection of type 2 diabetes risk: limitations of current diagnostic criteria,” Front. Endocrinol. (Lausanne)., vol. 14, Nov. 2023, doi: 10.3389/fendo.2023.1260623.
[7] N. S. Skolnik and A. J. Style, “Importance of Early Screening and Diagnosis of Chronic Kidney Disease in Patients with Type 2 Diabetes,” Diabetes Ther., vol. 12, no. 6, pp. 1613–1630, Jun. 2021, doi: 10.1007/s13300-021-01050-w.
[8] S. Shah et al., “Diabetic retinopathy in newly diagnosed Type 2 diabetes mellitus: Prevalence and predictors of progression; a national primary network study,” Diabetes Res. Clin. Pract., vol. 175, p. 108776, May 2021, doi: 10.1016/j.diabres.2021.108776.
[9] R. Manuela, “Positive Effekte einer frühen Diabetestherapie greifen lebenslang,” Diabetes aktuell, vol. 22, no. 06, pp. 234–234, Oct. 2024, doi: 10.1055/a-2420-2434.
[10] M. Masdiana, R. Hidayat, and T. Febrianti, “Efektifitas Intervensi Berbasis Komunitas Terhadap Penderita Diabetes Mellitus Tipe 2 : A Systematic Review,” J. Ners, vol. 9, no. 2, pp. 3115–3124, Apr. 2025, doi: 10.31004/jn.v9i2.44463.
[11] J. J. Boutilier, T. C. Y. Chan, M. Ranjan, and S. Deo, “Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis,” J. Med. Internet Res., vol. 23, no. 1, p. e20123, Jan. 2021, doi: 10.2196/20123.
[12] A. M. Rahmani et al., “Machine Learning (ML) in Medicine: Review, Applications, and Challenges,” Mathematics, vol. 9, no. 22, p. 2970, Nov. 2021, doi: 10.3390/math9222970.
[13] S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,” J. Healthc. Eng., vol. 2021, pp. 1–20, Jan. 2021, doi: 10.1155/2021/6679512.
[14] H. Habehh and S. Gohel, “Machine Learning in Healthcare,” Curr. Genomics, vol. 22, no. 4, pp. 291–300, Dec. 2021, doi: 10.2174/1389202922666210705124359.
[15] M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Machine learning towards intelligent systems: applications, challenges, and opportunities,” Artif. Intell. Rev., vol. 54, no. 5, pp. 3299–3348, Jun. 2021, doi: 10.1007/s10462-020-09948-w.
[16] E. Barbierato and A. Gatti, “The Challenges of Machine Learning: A Critical Review,” Electronics, vol. 13, no. 2, p. 416, Jan. 2024, doi: 10.3390/electronics13020416.
[17] R. Venkatesh et al., “Evaluation of Systemic Risk Factors in Patients with Diabetes Mellitus for Detecting Diabetic Retinopathy with Random Forest Classification Model,” Diagnostics, vol. 14, no. 16, p. 1765, Aug. 2024, doi: 10.3390/diagnostics14161765.
[18] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM),” Diagnostics, vol. 11, no. 9, p. 1714, Sep. 2021, doi: 10.3390/diagnostics11091714.
[19] C. Dewi, J. Zendrato, and H. J. Christanto, “Improvement of support vector machine for predicting diabetes mellitus with machine learning approach,” J. Auton. Intell., vol. 7, no. 2, Dec. 2023, doi: 10.32629/jai.v7i2.888.
[20] I. Abousaber, H. F. Abdallah, and H. El-Ghaish, “Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets,” Front. Artif. Intell., vol. 7, Jan. 2025, doi: 10.3389/frai.2024.1499530.
[21] Y. Chen, J. Zou, L. Liu, and C. Hu, “Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization,” Symmetry (Basel)., vol. 16, no. 3, p. 273, Feb. 2024, doi: 10.3390/sym16030273.
[22] S. K. Nisa, M. A. Barata, and P. E. Yuwita, “Optimization of Random Forest Algorithm with SMOTE Method to Improve the Accuracy of Early Diabetes Prediction,” Sci. J. Informatics, vol. 12, no. 3, pp. 387–396, Aug. 2025, doi: 10.15294/sji.v12i3.22986.
[23] S. Z. R. Krishandhie and A. Purwinarko, “Random Forest Algorithm Optimization using K-Nearest Neighbor and SMOTE on Diabetes Disease,” Recursive J. Informatics, vol. 3, no. 1, pp. 43–50, Mar. 2025, doi: 10.15294/rji.v3i1.1576.
[24] S. Rahmawati, A. Wibowo, and A. F. N. Masruriyah, “Improving Diabetes Prediction Accuracy in Indonesia: A Comparative Analysis of SVM, Logistic Regression, and Naive Bayes with SMOTE and ADASYN,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 8, no. 5, pp. 607–614, Oct. 2024, doi: 10.29207/resti.v8i5.5980.
[25] R. S. Abdulsadig and E. Rodriguez-Villegas, “A comparative study in class imbalance mitigation when working with physiological signals,” Front. Digit. Heal., vol. 6, Mar. 2024, doi: 10.3389/fdgth.2024.1377165.
[26] R. Mohammed and E. M. Karim, “FCM-CSMOTE: Fuzzy C-Means Center-SMOTE,” Expert Syst. Appl., vol. 248, p. 123406, Aug. 2024, doi: 10.1016/j.eswa.2024.123406.
[27] P. Sampath et al., “Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique,” Sci. Rep., vol. 14, no. 1, p. 28984, Nov. 2024, doi: 10.1038/s41598-024-78519-8.
[28] P. Talari et al., “Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2,” PLoS One, vol. 19, no. 1, p. e0292100, Jan. 2024, doi: 10.1371/journal.pone.0292100.
[29] M. Kivrak, U. Avci, H. Uzun, and C. Ardic, “The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients,” Diagnostics, vol. 14, no. 23, p. 2634, Nov. 2024, doi: 10.3390/diagnostics14232634.
[30] I. M. Bermudez Vera, J. Mosquera Restrepo, and D. F. Manotas-Duque, “Data Mining for the Adjustment of Credit Scoring Models in Solidarity Economy Entities: A Methodology for Addressing Class Imbalances,” Risks, vol. 13, no. 2, p. 20, Jan. 2025, doi: 10.3390/risks13020020.
[31] H. Sun, J. Li, and X. Zhu, “A Novel Expandable Borderline Smote Over-Sampling Method for Class Imbalance Problem,” IEEE Trans. Knowl. Data Eng., vol. 37, no. 5, pp. 2183–2199, May 2025, doi: 10.1109/TKDE.2025.3544284.
[32] Y. Chachoui, N. Azizi, R. Hotte, and T. Bensebaa, “Enhancing algorithmic assessment in education: Equi-fused-data-based SMOTE for balanced learning,” Comput. Educ. Artif. Intell., vol. 6, p. 100222, Jun. 2024, doi: 10.1016/j.caeai.2024.100222.
[33] M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, p. 87, Jun. 2024, doi: 10.1186/s40537-024-00943-4.
[34] S. Gholampour, “Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets: Clinical Validity of SMOTE Is Questionable,” Mach. Learn. Knowl. Extr., vol. 6, no. 2, pp. 827–841, Apr. 2024, doi: 10.3390/make6020039.
[35] S. M. Ganie, P. K. D. Pramanik, and Z. Zhao, “Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets,” Sci. Rep., vol. 15, no. 1, p. 13912, Apr. 2025, doi: 10.1038/s41598-025-97547-6.
[36] M. E. Ahsen, “Harnessing Unsupervised Ensemble Learning for Biomedical Applications: A Review of Methods and Advances,” Mathematics, vol. 13, no. 3, p. 420, Jan. 2025, doi: 10.3390/math13030420.
[37] B. O. Olorunfemi et al., “Efficient diagnosis of diabetes mellitus using an improved ensemble method,” Sci. Rep., vol. 15, no. 1, p. 3235, Jan. 2025, doi: 10.1038/s41598-025-87767-1.
[38] H. Ulutas, R. B. Günay, and M. E. Sahin, “Detecting diabetes in an ensemble model using a unique PSO-GWO hybrid approach to hyperparameter optimization,” Neural Comput. Appl., vol. 36, no. 29, pp. 18313–18341, Oct. 2024, doi: 10.1007/s00521-024-10160-y.
[39] M. S. Reza, R. Amin, R. Yasmin, W. Kulsum, and S. Ruhi, “Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data,” Heliyon, vol. 10, no. 2, p. e24536, Jan. 2024, doi: 10.1016/j.heliyon.2024.e24536.
[40] T. T. Prama, M. J. Rahman, M. Zaman, F. Sarker, and K. A. Mamun, “DiaBD: A diabetes dataset for enhanced risk analysis and research in Bangladesh,” Data Br., vol. 61, p. 111746, Aug. 2025, doi: 10.1016/j.dib.2025.111746.
[41] T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” J. Big Data, vol. 8, no. 1, p. 140, Oct. 2021, doi: 10.1186/s40537-021-00516-9.
[42] J. Meng and R. Xing, “Inside the ‘black box’: Embedding clinical knowledge in data-driven machine learning for heart disease diagnosis,” Cardiovasc. Digit. Heal. J., vol. 3, no. 6, pp. 276–288, Dec. 2022, doi: 10.1016/j.cvdhj.2022.10.005.
[43] S. Badawi, A. M. Saeed, S. A. Ahmed, P. A. Abdalla, and D. A. Hassan, “Kurdish News Dataset Headlines (KNDH) through multiclass classification,” Data Br., vol. 48, p. 109120, Jun. 2023, doi: 10.1016/j.dib.2023.109120.
[44] A. Ambarwari, Q. Jafar Adrian, and Y. Herdiyeni, “Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 1, pp. 117–122, Feb. 2020, doi: 10.29207/resti.v4i1.1517.
[45] V. V. Starovoitov and Y. I. Golub, “Data normalization in machine learning,” Informatics, vol. 18, no. 3, pp. 83–96, Sep. 2021, doi: 10.37661/1816-0301-2021-18-3-83-96.
[46] Z. Xue, J. Yang, R. Chen, Q. He, Q. Li, and X. Mei, “AR-Assisted Guidance for Assembly and Maintenance of Avionics Equipment,” Appl. Sci., vol. 14, no. 3, p. 1137, Jan. 2024, doi: 10.3390/app14031137.
[47] A. Wantoro, A. F. Yuliana, D. Y. A. Andini, I. Awaliyani, and W. Caesarendra, “Optimizing Type 2 Diabetes Classification with Feature Selection and Class Balancing in Machine Learning,” J. Tek. Inform., vol. 6, no. 4, pp. 2625–2637, Aug. 2025, doi: 10.52436/1.jutif.2025.6.4.5166.
[48] H. Hairani, T. Widiyaningtyas, and D. Dwi Prasetya, “Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies,” JOIV Int. J. Informatics Vis., vol. 8, no. 3, p. 1310, Sep. 2024, doi: 10.62527/joiv.8.3.2283.
[49] Y. Yang, H. A. Khorshidi, and U. Aickelin, “A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems,” Front. Digit. Heal., vol. 6, Jul. 2024, doi: 10.3389/fdgth.2024.1430245.
[50] I. Riadi, A. Yudhana, and G. C. Kurniawan, “Evaluating Synthetic Minority Oversampling Technique Strategies for Diabetes Mellitus Classification using K-Nearest Neighbors Algorithm,” J. Tek. Inform., vol. 6, no. 5, pp. 3958–3970, Oct. 2025, doi: 10.52436/1.jutif.2025.6.5.5189.
[51] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
[52] H. B. Kibria, M. Nahiduzzaman, M. O. F. Goni, M. Ahsan, and J. Haider, “An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI,” Sensors, vol. 22, no. 19, p. 7268, Sep. 2022, doi: 10.3390/s22197268.
[53] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.
[54] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
[55] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
[56] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. in Springer Series in Statistics. New York, NY: Springer New York, 2009. doi: 10.1007/978-0-387-84858-7.
[57] J. E. Black, J. K. Kueper, and T. S. Williamson, “An introduction to machine learning for classification and prediction,” Fam. Pract., vol. 40, no. 1, pp. 200–204, Feb. 2023, doi: 10.1093/fampra/cmac104.
[58] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, pp. 1937–1967, Mar. 2021, doi: 10.1007/s10462-020-09896-5.
[59] D. Boldini, F. Grisoni, D. Kuhn, L. Friedrich, and S. A. Sieber, “Practical guidelines for the use of gradient boosting for molecular property prediction,” J. Cheminform., vol. 15, no. 1, p. 73, Aug. 2023, doi: 10.1186/s13321-023-00743-7.
[60] J. H. Friedman, “Greedy function approximation: A gradient boosting machine.,” Ann. Stat., vol. 29, no. 5, Oct. 2001, doi: 10.1214/aos/1013203451.
[61] S. W. A. Sherazi, J.-W. Bae, and J. Y. Lee, “A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for STEMI and NSTEMI during 2-year follow-up in patients with acute coronary syndrome,” PLoS One, vol. 16, no. 6, p. e0249338, Jun. 2021, doi: 10.1371/journal.pone.0249338.
[62] S. Ali et al., “A Soft Voting Ensemble-Based Model for the Early Prediction of Idiopathic Pulmonary Fibrosis (IPF) Disease Severity in Lungs Disease Patients,” Life, vol. 11, no. 10, p. 1092, Oct. 2021, doi: 10.3390/life11101092.
[63] Pattern Recognition and Machine Learning. Springer New York, 2006. doi: 10.1007/978-0-387-45528-0.
[64] D. Li, Z. Liu, D. J. Armaghani, P. Xiao, and J. Zhou, “Novel ensemble intelligence methodologies for rockburst assessment in complex and variable environments,” Sci. Rep., vol. 12, no. 1, p. 1844, Feb. 2022, doi: 10.1038/s41598-022-05594-0.
[65] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008. doi: 10.1017/CBO9780511809071.
[66] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, p. 6, Dec. 2020, doi: 10.1186/s12864-019-6413-7.
[67] I. Eegdeman, I. Cornelisz, M. Meeter, and C. van Klaveren, “Identifying false positives when targeting students at risk of dropping out,” Educ. Econ., vol. 31, no. 3, pp. 313–325, May 2023, doi: 10.1080/09645292.2022.2067131.
[68] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168–192, Jan. 2021, doi: 10.1016/j.aci.2018.08.003.
[69] G. Velarde et al., “Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment,” Intell. Syst. with Appl., vol. 22, p. 200354, Jun. 2024, doi: 10.1016/j.iswa.2024.200354.
[70] X. Wang et al., “Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, p. 105, Dec. 2021, doi: 10.1186/s12911-021-01471-4.
[71] T. Usuzaki, K. Takahashi, and R. Inamori, “Be Careful About Metrics When Imbalanced Data Is Used for a Deep Learning Model,” Chest, vol. 165, no. 3, pp. e87–e89, Mar. 2024, doi: 10.1016/j.chest.2023.10.039.
[72] T. Zhu, X. Liu, and E. Zhu, “Oversampling with Reliably Expanding Minority Class Regions for Imbalanced Data Learning,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2022, doi: 10.1109/TKDE.2022.3171706.
[73] A. Demircioğlu, “Applying oversampling before cross-validation will lead to high bias in radiomics,” Sci. Rep., vol. 14, no. 1, p. 11563, May 2024, doi: 10.1038/s41598-024-62585-z.
[74] M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digit. Heal., vol. 2, no. 11, p. e0000290, Nov. 2023, doi: 10.1371/journal.pdig.0000290.
[75] J. Adeoye, L.-W. Zheng, P. Thomson, S.-W. Choi, and Y.-X. Su, “Explainable ensemble learning model improves identification of candidates for oral cancer screening,” Oral Oncol., vol. 136, p. 106278, Jan. 2023, doi: 10.1016/j.oraloncology.2022.106278.
[76] S. Friedrich and T. Friede, “On the role of benchmarking data sets and simulations in method comparison studies,” Biometrical J., vol. 66, no. 1, Jan. 2024, doi: 10.1002/bimj.202200212.
[77] T. Prexawanprasut and T. Banditwattanawong, “Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique,” Informatics, vol. 11, no. 2, p. 35, May 2024, doi: 10.3390/informatics11020035.
[78] V. Singh, M. Pencina, A. J. Einstein, J. X. Liang, D. S. Berman, and P. Slomka, “Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging,” Sci. Rep., vol. 11, no. 1, p. 14490, Jul. 2021, doi: 10.1038/s41598-021-93651-5.
[79] M. Hasan, W. Wu, and X. Zhao, “SHAP-Driven Feature Analysis Approach for Epileptic Seizure Prediction,” J. Med. Syst., vol. 49, no. 1, p. 77, Jun. 2025, doi: 10.1007/s10916-025-02211-1.
[80] Y. Jang, “Feature-based ensemble modeling for addressing diabetes data imbalance using the SMOTE, RUS, and random forest methods: a prediction study,” Ewha Med. J., vol. 48, no. 2, p. e32, Apr. 2025, doi: 10.12771/emj.2025.00353.
[81] S. M. Ganie, P. K. D. Pramanik, M. Bashir Malik, S. Mallik, and H. Qin, “An ensemble learning approach for diabetes prediction using boosting techniques,” Front. Genet., vol. 14, Oct. 2023, doi: 10.3389/fgene.2023.1252159.
[82] T. F. Sukamto, C. L. Prameswary, D. Royadi, and D. Sofia, “Diabetes Disease Prediction on Unbalanced Data Using SMOTE-Tomek Links and Random Forest Algorithm,” G-Tech J. Teknol. Terap., vol. 9, no. 3, pp. 1194–1203, Jul. 2025, doi: 10.70609/g-tech.v9i3.7164.
[83] M. T. García-Ordás, C. Benavides, J. A. Benítez-Andrades, H. Alaiz-Moretón, and I. García-Rodríguez, “Diabetes detection using deep learning techniques with oversampling and feature augmentation,” Feb. 2024, doi: 10.1016/j.cmpb.2021.105968.
Copyright (c) 2026 Siti Fatimah Nurdiah Permatasari, Ermatita Ermatita (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





