Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis

dc.contributor.advisor	Ruhe, Guenther
dc.contributor.author	Deshpande, Gouri
dc.contributor.committeemember	Rokne, Jon
dc.contributor.committeemember	Nayebi, Maleknaz
dc.contributor.committeemember	Ferrari, Alessio
dc.contributor.committeemember	Bento, Mariana
dc.date	2022-06
dc.date.accessioned	2022-02-04T14:45:48Z
dc.date.available	2022-02-04T14:45:48Z
dc.date.issued	2022-02-02
dc.description.abstract	Dependencies among requirements significantly impact the design, development, and testing of evolving software products. Requirements Dependencies Extraction (RDE) is a cognitively complex task due to rich semantics in natural language-based requirements, which impose challenges in automating the extraction and analysis of dependencies. The challenges intensify further when dependency types are considered. RDE is a part of the extensive decision support system to make effective software release planning, development, and testing decisions. Recently, Machine Learning and Natural Language Processing techniques have successfully automated tasks in Requirements Engineering to a large extent. Despite this success, there are some challenges to the automation of RDE - 1) Due to the nature of the problem, it is cognitively difficult to identify all the dependencies among requirements; hence generating or procuring high-quality annotations for automation through Machine Learning is an arduous task. 2) In the real-world, unlabelled data is abundant and supervised ML techniques need a training set. Lack of data for training is one of the challenges when using ML for RDE. 3) Textual requirements lack structure due to natural language, and feature extraction (transformation of the raw text into suitable internal numerical representations i.e.feature vector) techniques of NLP lead to ML techniques’ success. However, feature extraction method identification and application are cost and effort-intensive. 4) While there is a broad spectrum of Machine Learning techniques to choose from for RDE automation, not all techniques are economically viable in all the scenarios considering data size and effort investment. Hence, there is a need to evaluate the ML techniques beyond just performance measures for effective decision making. This thesis addresses these challenges and provides solutions. The results described in this thesis are derived from a series of empirical studies on industry and open-source software (OSS) datasets. The main contributions are as follows: • Performed a comprehensive assessment of Weakly Supervised Learning and Active Learning (AL) to address the data acquisition challenges using public and OSS datasets. Additionally, we compared Active Learning with Ontology-based retrieval (OBR) and further developed a hybrid solution that showed a 50% reduction in the labeling (human) effort for the two industry dataset evaluations from: Siemens Austria and Blackline safety. • Evaluated and compared a conventional ML-based Transfer Learning and state-of-the-art Deep Learning (DL) method (Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)) for 6 Mozilla products (OSS) to address lack of training data challenge. We showed that the DL method outperformed the within project’s conventional ML models by 27% to 50% (on F1-score measure). ii • Demonstrated that the state-of-the-art DL method (fine-tuned BERT) could successfully overcome the feature extraction challenge of RDE as fine-tuned BERT outperformed conventional ML methods by 13% to 27% on the F1-score for the Firefox, Redmine and Typo3 product’s datasets. Also, we showed that fine-tuned BERT successfully predicted the direction of dependency. • Utilized a nine-stage ML process model and proposed a novel ROI of ML classification modeling approach. ROI of ML classification showed scenarios when it is viable to utilize complex methods over conventional methods considering the cost and benefits of data accumulation. Utilizing OSS datasets for evaluations and practitioner inputs for cost factors, we showed accuracy and ROI trade-offs in ML approach selection for RDE. Thus, we have demonstrated empirical evidence of ROI as an additional criterion for ML performance evaluation	en_US
dc.identifier.citation	Deshpande, G. (2022). Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.	en_US
dc.identifier.doi	http://dx.doi.org/10.11575/PRISM/39595
dc.identifier.uri	http://hdl.handle.net/1880/114394
dc.language.iso	eng	en_US
dc.publisher.faculty	Science	en_US
dc.publisher.institution	University of Calgary	en
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.	en_US
dc.subject	Requirements Engineering	en_US
dc.subject	Software Engineering	en_US
dc.subject	BERT	en_US
dc.subject	Return of Investment	en_US
dc.subject	ROI	en_US
dc.subject	Cost and Benefit	en_US
dc.subject	NLP	en_US
dc.subject	Cost Benefit	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis	en_US
dc.type	doctoral thesis	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	University of Calgary	en_US
thesis.degree.name	Doctor of Philosophy (PhD)	en_US
ucalgary.item.requestcopy	true	en_US