Dimensionality Reduction Using Principal Component Analysis and Feature Selection Using Genetic Algorithm with Support Vector Machine for Microarray Data Classification
Downloads
DNA microarray is used to analyze gene expression on a large scale simultaneously and plays a critical role in cancer detection. The creation of a DNA microarray starts with RNA isolation from the sample, which is then converted into cDNA and scanned to generate gene expression data. However, the data generated through this process is highly dimensional, which can affect the performance of predictive models for cancer detection. Therefore, dimensionality reduction is required to reduce data complexity. This study aims to analyze the impact of applying Principal Component Analysis (PCA) for dimensionality reduction, Genetic Algorithm (GA) for feature selection, and their combination on microarray data classification using Support Vector Machine (SVM). The datasets used are microarray datasets, including breast cancer, ovarian cancer, and leukemia. The research methodology involves preprocessing, PCA for dimensionality reduction, GA for feature selection, data splitting, SVM classification, and evaluation. Based on the results, the application of PCA dimensionality reduction combined with GA feature selection and SVM classification achieved the best performance compared to other classifications. For the breast cancer dataset, the highest accuracy was 73.33%, recall 0.74, precision 0.75, and F1 score 0.73. For the ovarian cancer dataset, the highest accuracy was 98.68%, recall 0.98, precision 0.99, and F1 score 0.99. For the leukemia dataset, the highest accuracy was 95.45%, recall 0.94, precision 0.97, and F1 score 0.95. It can be concluded that combining PCA for dimensionality reduction with GA for feature selection in microarray classification can simplify the data and improve the accuracy of the SVM classification model. The implications of this study emphasize the effectiveness of applying PCA and GA methods in enhancing the classification performance of microarray data.
Copyright (c) 2025 dwikartini (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).