Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis

dc.contributor.advisorRuhe, Guenther
dc.contributor.authorDeshpande, Gouri
dc.contributor.committeememberRokne, Jon
dc.contributor.committeememberNayebi, Maleknaz
dc.contributor.committeememberFerrari, Alessio
dc.contributor.committeememberBento, Mariana
dc.date2022-06
dc.date.accessioned2022-02-04T14:45:48Z
dc.date.available2022-02-04T14:45:48Z
dc.date.issued2022-02-02
dc.description.abstractDependencies among requirements significantly impact the design, development, and testing of evolving software products. Requirements Dependencies Extraction (RDE) is a cognitively complex task due to rich semantics in natural language-based requirements, which impose challenges in automating the extraction and analysis of dependencies. The challenges intensify further when dependency types are considered. RDE is a part of the extensive decision support system to make effective software release planning, development, and testing decisions. Recently, Machine Learning and Natural Language Processing techniques have successfully automated tasks in Requirements Engineering to a large extent. Despite this success, there are some challenges to the automation of RDE - 1) Due to the nature of the problem, it is cognitively difficult to identify all the dependencies among requirements; hence generating or procuring high-quality annotations for automation through Machine Learning is an arduous task. 2) In the real-world, unlabelled data is abundant and supervised ML techniques need a training set. Lack of data for training is one of the challenges when using ML for RDE. 3) Textual requirements lack structure due to natural language, and feature extraction (transformation of the raw text into suitable internal numerical representations i.e.feature vector) techniques of NLP lead to ML techniques’ success. However, feature extraction method identification and application are cost and effort-intensive. 4) While there is a broad spectrum of Machine Learning techniques to choose from for RDE automation, not all techniques are economically viable in all the scenarios considering data size and effort investment. Hence, there is a need to evaluate the ML techniques beyond just performance measures for effective decision making. This thesis addresses these challenges and provides solutions. The results described in this thesis are derived from a series of empirical studies on industry and open-source software (OSS) datasets. The main contributions are as follows: • Performed a comprehensive assessment of Weakly Supervised Learning and Active Learning (AL) to address the data acquisition challenges using public and OSS datasets. Additionally, we compared Active Learning with Ontology-based retrieval (OBR) and further developed a hybrid solution that showed a 50% reduction in the labeling (human) effort for the two industry dataset evaluations from: Siemens Austria and Blackline safety. • Evaluated and compared a conventional ML-based Transfer Learning and state-of-the-art Deep Learning (DL) method (Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)) for 6 Mozilla products (OSS) to address lack of training data challenge. We showed that the DL method outperformed the within project’s conventional ML models by 27% to 50% (on F1-score measure). ii • Demonstrated that the state-of-the-art DL method (fine-tuned BERT) could successfully overcome the feature extraction challenge of RDE as fine-tuned BERT outperformed conventional ML methods by 13% to 27% on the F1-score for the Firefox, Redmine and Typo3 product’s datasets. Also, we showed that fine-tuned BERT successfully predicted the direction of dependency. • Utilized a nine-stage ML process model and proposed a novel ROI of ML classification modeling approach. ROI of ML classification showed scenarios when it is viable to utilize complex methods over conventional methods considering the cost and benefits of data accumulation. Utilizing OSS datasets for evaluations and practitioner inputs for cost factors, we showed accuracy and ROI trade-offs in ML approach selection for RDE. Thus, we have demonstrated empirical evidence of ROI as an additional criterion for ML performance evaluationen_US
dc.identifier.citationDeshpande, G. (2022). Requirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/39595
dc.identifier.urihttp://hdl.handle.net/1880/114394
dc.language.isoengen_US
dc.publisher.facultyScienceen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectRequirements Engineeringen_US
dc.subjectSoftware Engineeringen_US
dc.subjectBERTen_US
dc.subjectReturn of Investmenten_US
dc.subjectROIen_US
dc.subjectCost and Benefiten_US
dc.subjectNLPen_US
dc.subjectCost Benefiten_US
dc.subject.classificationComputer Scienceen_US
dc.titleRequirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysisen_US
dc.typedoctoral thesisen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameDoctor of Philosophy (PhD)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2022_deshpande_gouri.pdf
Size:
14.09 MB
Format:
Adobe Portable Document Format
Description:
Dissertation Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: