The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for
better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults
occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part
of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault
prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that
were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for
fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect
the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating
the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy
of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored.
Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large
legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance
similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the
CBR models have better performance than models based on multiple linear regression.
Keywords Software quality - Case-based reasoning - Software fault prediction - Similarity functions - Solution algorithm - Software metrics
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the
Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software
reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical
modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project
leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association
for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999
International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference
on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences,
symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards
of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems.
Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently
a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests
include software engineering, computational intelligence, data mining, software measurement, software reliability and quality
engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the
IEEE Computer Society and the Association for Computing Machinery.