An Empirical Study of Cross-Project and Within-Project Performance in Software Defect Prediction Models Using Tree-Based and Boosting Classifiers

Software Defect Prediction WP-SDP CP-SDP Classification

Authors

April 24, 2025
May 14, 2025
August 20, 2025

Downloads

Software Defect Prediction (SDP) is a vital process in modern software engineering aimed at identifying faulty components in the early stages of development. In this study, we conducted a comprehensive evaluation of two widely employed SDP approaches, Within-Project Software Defect Prediction (WP-SDP) and Cross-Project Software Defect Prediction (CP-SDP), using identical preprocessing steps to ensure an objective comparison. We utilized the NASA MDP dataset, where each project was split into 70% training and 30% testing data, and applied three distinct resampling strategies—no sampling, oversampling, and undersampling—to address the challenge of class imbalance. Five classification algorithms were examined, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), and LightGBM (LGBM). Performance was measured primarily using Accuracy and Area Under the Curve (AUC) metrics, resulting in 360 experimental outcomes. Our findings revealed that WP-SDP, combined with oversampling and Random Forest, demonstrated superior predictive capability on most projects, achieving an Accuracy of 89.92% and an AUC of 0.931 on PC4. Nonetheless, CP-SDP excelled in certain small-scale projects (e.g., MW1), underscoring its potential when local historical data is scarce but inter-project characteristics remain sufficiently similar. This study’s results underscore the importance of selecting a prediction scheme tailored to specific project attributes, class imbalance levels, and available historical data. By establishing a standardized methodological framework, our work contributes to a clearer understanding of the strengths and limitations of WP-SDP and CP-SDP, paving the way for more effective defect detection strategies and improved software quality.

How to Cite

Raidra Zeniananto, Herteno, R., Radityo Adi Nugroho, Andi Farmadi, & Setyo Wahyu Saputro. (2025). An Empirical Study of Cross-Project and Within-Project Performance in Software Defect Prediction Models Using Tree-Based and Boosting Classifiers. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(3), 514-525. https://doi.org/10.35882/ijeeemi.v7i3.95

Similar Articles

1-10 of 52

You may also start an advanced similarity search for this article.