Feature Sets in Just-in-Time Defect Prediction: An Empirical Evaluation

Peter Bludau and Alexander Pretschner

Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 22-31

2022 · doi: 10.1145/3558489.3559068

abstract

Just-in-time defect prediction assigns a defect risk to each new change to a software repository in order to prioritize review and testing efforts. Over the last decades different approaches were proposed in literature to craft more accurate prediction models. However, defect prediction is still not widely used in industry, due to predictions with varying performance. In this study, we evaluate existing features on six open-source projects and propose two new features sets, not yet discussed in literature. By combining all feature sets, we improve MCC by on average 21%, leading to the best performing models when compared to state-of-the-art approaches. We also evaluate effort-awareness and find that on average 14% more defects can be identified, inspecting 20% of changed lines.

subject terms: machine learning, JIT defect prediction, empirical evaluation