Automated Bug Severity Prediction using Source Code Metrics, Static Analysis, and Code Representation

dc.contributor.advisorHemmati, Hadi
dc.contributor.authorMashhadi, Ehsan
dc.contributor.committeememberBarcomb, Ann
dc.contributor.committeememberTan, Benjamin
dc.date.accessioned2022-09-14T00:14:18Z
dc.date.available2022-09-14T00:14:18Z
dc.date.issued2022-09-12
dc.description.abstractIn the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher severity ones get immediate attention. In this thesis, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and also two popular static analysis tools (SpotBugs and Infer) for analyzing their capability in predicting defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are powerful in predicting buggy code, they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%-7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Furthermore, we show that code metrics and static analysis methods can be complementary in terms of estimating bug severity. For finding the effectiveness of machine learning models in predicting bug severity, we train 8 different models on code metrics only as a baseline and evaluate them based on different evaluation metrics. The overall result was not promising, but the Decision Tree and Random Forest models have better results. Then, we leveraged the pre-trained CodeBERT model to use code representation by feeding the source code input only, and the results improved significantly in the range of 29%-140% for different metrics. We also integrated code metrics into the CodeBERT model by providing two architectures named ConcatInline and ConcatCLS which enhance the CodeBERT model efficacy.en_US
dc.identifier.citationMashhadi, E. (2022). Automated Bug Severity Prediction using Source Code Metrics, Static Analysis, and Code Representation (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.urihttp://hdl.handle.net/1880/115221
dc.identifier.urihttps://dx.doi.org/10.11575/PRISM/40240
dc.language.isoengen_US
dc.publisher.facultySchulich School of Engineeringen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subject.classificationArtificial Intelligenceen_US
dc.subject.classificationComputer Scienceen_US
dc.titleAutomated Bug Severity Prediction using Source Code Metrics, Static Analysis, and Code Representationen_US
dc.typemaster thesisen_US
thesis.degree.disciplineEngineering – Electrical & Computeren_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameMaster of Science (MSc)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2022_mashhadi_ehsan.pdf
Size:
2.86 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: