Detection and Mitigation of Bias in Machine Learning Software and Datasets

Das, Ajoy

Detection and Mitigation of Bias in Machine Learning Software and Datasets

dc.contributor.advisor	Uddin, Gias
dc.contributor.author	Das, Ajoy
dc.contributor.committeemember	Krishnamurthy, Diwakar
dc.contributor.committeemember	Leung, Henry
dc.date	2023-06
dc.date.accessioned	2023-01-25T23:41:52Z
dc.date.available	2023-01-25T23:41:52Z
dc.date.issued	2023-01-23
dc.description.abstract	Fairness, i.e., lack of bias during a decision-making process is a desirable property in any software system that is used to make critical decisions (e.g., mortgage approval). However, with the rise of Machine Learning (ML) systems, the concern for unfair systems is also growing rapidly as ML systems are inherently difficult to understand and debug. Moreover, datasets that contain various types of biases can be drastic to the users and systems that utilize these datasets. We have already seen evidence of the drastic influence of bias in various cases, ranging from job recruitment to parole approval. As a result, fairness metrics and mitigation approaches are being increasingly necessary to deal with this issue. Given the growing importance of bias detection and mitigation approaches for ML software systems, it is important to learn how bias is detected and mitigated in ML software systems and datasets and how we could assist in the detection and mitigation of such biases using novel toolkits. In this thesis, we explore this topic from two dimensions: (1) First, we qualitatively study how fairness APIs (i.e. software libraries) are used in the wild (i.e., in open-source ML software systems) to detect and mitigate diverse use cases. (2) Second, we develop a suite of toolkits to support the detection and mitigation of labeling inconsistency bias in sentiment analysis datasets for software engineering (SE). A labeling inconsistency arises when two similar sentences in the datasets have different labels, whereas they should ideally have the same labels. Our major observations in this thesis are: (1) Fairness APIs are increasingly being used in diverse real- world use cases, but developers find it challenging to properly use the APIs. (2) Despite having several fairness APIs, we still need new toolkit support besides the fairness APIs to address a bias like labeling inconsistency in sentiment analysis for software engineering (SA4SE) datasets. Our developed toolkits can aid in this task. (3) Our developed toolkits can be adapted to address labeling inconsistency bias problems in any textual datasets that are used to build classification-based ML models.	en_US
dc.identifier.citation	Das, A. (2023). Detection and mitigation of bias in Machine Learning software and datasets (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.	en_US
dc.identifier.uri	http://hdl.handle.net/1880/115772
dc.identifier.uri	https://dx.doi.org/10.11575/PRISM/40685
dc.language.iso	eng	en_US
dc.publisher.faculty	Schulich School of Engineering	en_US
dc.publisher.institution	University of Calgary	en
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.	en_US
dc.subject	bias	en_US
dc.subject	fairness	en_US
dc.subject	machine learning	en_US
dc.subject	software fairness	en_US
dc.subject	labeling inconsistency	en_US
dc.subject.classification	Artificial Intelligence	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Detection and Mitigation of Bias in Machine Learning Software and Datasets	en_US
dc.type	master thesis	en_US
thesis.degree.discipline	Engineering – Electrical & Computer	en_US
thesis.degree.grantor	University of Calgary	en_US
thesis.degree.name	Master of Science (MSc)	en_US
ucalgary.item.requestcopy	true	en_US