Black-box Behavioral Model Inference for Autopilot Software Systems

Hemmati, HadiMashhadi Ebrahim, Mohammad Jafar2020-09-292020-09-292020-09-22Mashhadi Ebrahim, M. J. (2020). Black-box Behavioral Model Inference for Autopilot Software Systems (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.http://hdl.handle.net/1880/112645Inferring behavior model of a running software system is quite useful for several automated software engineering tasks, such as program comprehension, anomaly detection, and testing. Most existing dynamic model inference techniques are white-box, i.e. they require source code to be instrumented to get run-time traces. However, in many systems, instrumenting the entire source code is not possible (e.g., when using black-box third-party libraries) or might be very costly. Unfortunately, most black-box techniques that detect states over time are either univariate, or make assumptions on the data distribution, or have limited power for learning over a long period of past behavior. To overcome the above issues, in this thesis, I proposed a hybrid deep neural network that accepts as input a set of time series, one per input/output signal of the system, and applies a set of convolutional and recurrent layers to learn the non-linear correlations between signals and the patterns, over time. I have applied this approach on two real UAV autopilot case studies: one from our industry partner, MicroPilot (MP in short), with half a million lines of C code, and one widely used open-source solution: Paparazzi. I ran more than 1200 system-level tests in total (to generate the input data) and inferred the system’s internal state, over time. In case of Paparazzi, as it did not include system tests like MP, I created a tool that generates and executes meaningful test scenarios. Comparison with several traditional time series change point detection techniques showed that this approach improves their performance by up to 102% in MP’s case and 94% in Paparazzi’s, in terms of finding state change points, measured by F1 score. I also showed that this state classification algorithm provides on average 90.45% F1 score for MP and 82.23% for Paparazzi, which improves traditional classification algorithms by up to 17% in MP’s case and 20% in Paparazzi’s. In addition, by creating a hyper-parameter tuning pipeline using grid search technique, despite having a way smaller training set in the second case study (7 times smaller compared to the first one), I managed to get a better performance, up to 48% better, out of the neural network model as measured by 8 metrics. The tuning performance is compared to using the same hyper-parameters that worked for MP’s case, for Paparazzi.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Software EngineeringDeep LearningSpecification MiningAutomated TestingComputer ScienceBlack-box Behavioral Model Inference for Autopilot Software Systemsmaster thesis10.11575/PRISM/38304