Automated Software Testing of Deep Neural Network Programs

Hemmati, HadiVahdat Pour, Maryam2020-09-282020-09-282020-09-23Vahdat Pour, M. (2020). Automated Software Testing of Deep Neural Network Programs (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.http://hdl.handle.net/1880/112601Machine Learning (ML) models play an essential role in various applications. Specifically, in recent years, Deep neural networks (DNN) are leveraged in a wide range of application domains. Given such growing applications, DNN models' faults can raise concerns about its trustworthiness and may cause substantial losses. Therefore, detecting erroneous behaviours in any machine learning system, specially DNNs is critical. Software testing is a widely used mechanism to detect faults. However, since the exact output of most DNN models is not known for a given input data, traditional software testing techniques cannot be directly applied. In the last few years, several papers have proposed testing techniques and adequacy criteria for testing DNNs. This thesis studies three types of DNN testing techniques, using text and image input data. In the first technique, I use Multi Implementation Testing (MIT) to generate a test oracle for finding faulty DNN models. In the second experiment, I compare the best adequacy metrics from the coverage-based criteria (Surprise Adequacy) and the best example from mutation-based criteria (DeepMutation) in terms of their effectiveness for detecting adversarial examples. Finally, in the last experiment, I applied three different test generation techniques (including a novel technique) to the DNN models and compared their performance if the generated test data are used to re-train the models. The first experiment results indicate that using MIT as a test oracle can successfully detect the faulty programs. In the second study, the results indicate that although the mutation-based metric can show better performance in some experiments, it is sensitive to its parameters and requires hyper-parameter tuning. Finally, the last experiment shows a 17% improvement in terms of F1-score, when using the proposed approach in this thesis compared to the original models from the literature.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Deep Neural NetworkTesting DNN modelsMulti-implementation testingGuided MutationTest case generationEngineeringAutomated Software Testing of Deep Neural Network Programsmaster thesis10.11575/PRISM/38260