Optimizing Categorical Boosting Model with Optuna for Anti-Tuberculosis Drugs Classification
Downloads
Tuberculosis is one of the leading causes of death globally, with death rate reaching 1.30 million by 2022, an increase of 3.2% compared to the previous year. Indonesia is one of the countries with the highest number of tuberculosis cases in the world. The Directly Observed Treatment Short-course (DOTS) plays a role in improving the effectiveness of tuberculosis therapy by ensuring the availability of appropriate anti-tuberculosis drugs. However, errors in drug selection can lead to therapy failure, relapse, and Multi-Drug Resistant (MDR) cases. To overcome this, classification models based on patient medical record data can be used to improve the accuracy of drug selection. This research focuses on developing classification model to determine the type of drug using Categorical Boosting algorithm optimized with Optuna using Tree-structured Parzen Estimator. The data consisted of numerical variables, such as age, treatment duration, and categorical variables, such as history of diabetes mellitus, HIV status, drug combination. The CatBoost algorithm was chosen due to its ability to handle categorical data. Hyperparameter optimization was performed to obtain the best parameters. The preprocessing stage involved memory reduction, feature normalization, and encoding on 620 data samples, which were then divided into 90% training and 10% test data. Experimental results show CatBoost model produces an initial accuracy of 90%. After applying parameter optimization techniques using Optuna, the accuracy increased to 96%, showing 6% improvement. The model is able to accurately classify drugs combination, which can support the selection of more effective therapies for tuberculosis patients. Thus, the use of SMOTE to address class imbalance combined with Optuna for hyperparameter optimization was shown to improve the accuracy of CatBoost-based classification models. This finding confirms the effectiveness of SMOTE and Optuna methods in improving the accuracy of prediction models for drug type classification, contributing the improvement of tuberculosis treatment strategies.
Copyright (c) 2025 Yosua Satria Bara Harmoni, Kartika Maulida Hindrayani, Dwi Arman Prasetya (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





