The Effect of Smote-Tomek on the Classification of Chronic Diseases Based on Health and Lifestyle Data
Downloads
Machine learning models for chronic disease prediction are often trained on imbalanced healthcare datasets, where non-disease cases dominate. This condition can lead to misleadingly high accuracy while failing to identify patients with chronic diseases, limiting clinical usefulness. This study aims to analyze the impact of class imbalance on model performance and to evaluate the effectiveness of the SMOTE–Tomek resampling technique in improving chronic disease prediction. This research provides empirical evidence that accuracy alone is insufficient for evaluating healthcare models and demonstrates that imbalance-aware preprocessing is essential for valid and reliable chronic disease detection. Five classification models, such as Support Vector Machine, Random Forest, K-Nearest Neighbors, Gradient Boosting, and XGBoost, were evaluated on a lifestyle-based chronic disease dataset under two conditions: without resampling and with SMOTE–Tomek. Model performance was assessed using accuracy, precision, recall, F1-score, and AUC. Without SMOTE–Tomek, all models failed to detect chronic disease cases, producing near-zero recall and F1-scores despite accuracy exceeding 80%. After applying SMOTE–Tomek, substantial improvements were observed across all models, particularly in recall and AUC. Support Vector Machine achieved the best overall performance, with an accuracy of 92.9%, a precision of 92%, a recall of 93.9%, an F1-score of 0.93, and an AUC of 0.98. The findings confirm that handling class imbalance is a prerequisite for meaningful chronic disease prediction. The consistent increase in recall and AUC across all evaluated models confirms that the improvement stems from enhanced class separability rather than metric inflation. The proposed approach supports more reliable early screening and decision-support systems in preventive healthcare
Copyright (c) 2025 Muhammad Adika Riswanda, Friska Abadi, Muhammad Itqan Mazdadi, Mohammad Reza Faisal, Rudy Herteno (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





