Explainable AI for Software Engineering: A Systematic Review and an Empirical Study

Date
2023-01-23
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

In recent years, leveraging machine learning (ML) techniques has become one of the main solutions to tackle many software engineering (SE) tasks, in research studies (ML4SE). This has been achieved by utilizing state-of-the-art models that tend to be more complex and black-box, such as deep learning, which is led to less explainable solutions. This lack of explainability reduces trust and uptake of ML4SE solutions by professionals in the industry. One potential remedy is to offer explainable AI (XAI) methods to provide the missing explainability. In this thesis, we aim to explore to what extent XAI has been studied in the SE domain (XAI4SE) and provide a comprehensive view of the current state-of-the-art as well as challenges and a roadmap for future work. In order to do so, we conduct a systematic literature review on 24 (out of 869 papers that were selected by keyword search) most relevant published studies in XAI4SE. Our analysis reveals that among the identified studies, software maintenance (%68) and particularly defect prediction has the highest share on the SE stages and tasks that leverage XAI approaches. We also found that the published XAI methods are mainly applied to more classic ML models (e.g., random forest, decision trees, and regression models), rather than more complex deep learning and generative models (e.g. Transformer code models). Also, our study shows that XAI4SE is mainly used to improve the accuracy or interpretability aspects of the underlying ML models. Furthermore, we noticed a clear lack of standard evaluation metrics for XAI methods in the literature which has caused confusion among researchers and a lack of benchmarks for comparisons. To fill one of the mentioned gaps, we conduct an empirical study on the state-of-the-art Transformer-based models (CodeBERT and GraphCodeBERT) on a set of software engineering downstream tasks: code document generation (CDG), code refinement (CR), and code translation (CT). Initially, we evaluate the validity of the attention mechanism as an explainability method for each particular task. Next, through quantitative and qualitative studies, we show that CodeBERT and GraphCodeBERT learn to put attention to certain token types, depending on the downstream task. Furthermore, we show there are common patterns that cause the model to not work as expected (perform poorly while the problem at hand is easy), such as when there is a long input or when the model fails to pay proper attention to certain token types that are important for that task. Additionally, we suggest recommendations that may alleviate the observed challenges.

Description
Keywords
Citation
Haji Mohammadkhani, A. (2023). Explainable AI for software engineering: a systematic review and an empirical study (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.