Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis

Date
2022-02-02
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dependencies among requirements significantly impact the design, development, and testing of evolving software products. Requirements Dependencies Extraction (RDE) is a cognitively complex task due to rich semantics in natural language-based requirements, which impose challenges in automating the extraction and analysis of dependencies. The challenges intensify further when dependency types are considered. RDE is a part of the extensive decision support system to make effective software release planning, development, and testing decisions. Recently, Machine Learning and Natural Language Processing techniques have successfully automated tasks in Requirements Engineering to a large extent. Despite this success, there are some challenges to the automation of RDE - 1) Due to the nature of the problem, it is cognitively difficult to identify all the dependencies among requirements; hence generating or procuring high-quality annotations for automation through Machine Learning is an arduous task. 2) In the real-world, unlabelled data is abundant and supervised ML techniques need a training set. Lack of data for training is one of the challenges when using ML for RDE. 3) Textual requirements lack structure due to natural language, and feature extraction (transformation of the raw text into suitable internal numerical representations i.e.feature vector) techniques of NLP lead to ML techniques’ success. However, feature extraction method identification and application are cost and effort-intensive. 4) While there is a broad spectrum of Machine Learning techniques to choose from for RDE automation, not all techniques are economically viable in all the scenarios considering data size and effort investment. Hence, there is a need to evaluate the ML techniques beyond just performance measures for effective decision making. This thesis addresses these challenges and provides solutions. The results described in this thesis are derived from a series of empirical studies on industry and open-source software (OSS) datasets. The main contributions are as follows: • Performed a comprehensive assessment of Weakly Supervised Learning and Active Learning (AL) to address the data acquisition challenges using public and OSS datasets. Additionally, we compared Active Learning with Ontology-based retrieval (OBR) and further developed a hybrid solution that showed a 50% reduction in the labeling (human) effort for the two industry dataset evaluations from: Siemens Austria and Blackline safety. • Evaluated and compared a conventional ML-based Transfer Learning and state-of-the-art Deep Learning (DL) method (Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)) for 6 Mozilla products (OSS) to address lack of training data challenge. We showed that the DL method outperformed the within project’s conventional ML models by 27% to 50% (on F1-score measure). ii • Demonstrated that the state-of-the-art DL method (fine-tuned BERT) could successfully overcome the feature extraction challenge of RDE as fine-tuned BERT outperformed conventional ML methods by 13% to 27% on the F1-score for the Firefox, Redmine and Typo3 product’s datasets. Also, we showed that fine-tuned BERT successfully predicted the direction of dependency. • Utilized a nine-stage ML process model and proposed a novel ROI of ML classification modeling approach. ROI of ML classification showed scenarios when it is viable to utilize complex methods over conventional methods considering the cost and benefits of data accumulation. Utilizing OSS datasets for evaluations and practitioner inputs for cost factors, we showed accuracy and ROI trade-offs in ML approach selection for RDE. Thus, we have demonstrated empirical evidence of ROI as an additional criterion for ML performance evaluation
Description
Keywords
Requirements Engineering, Software Engineering, BERT, Return of Investment, ROI, Cost and Benefit, NLP, Cost Benefit
Citation
Deshpande, G. (2022). Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.