Detection and Mitigation of Bias in Machine Learning Software and Datasets

dc.contributor.advisorUddin, Gias
dc.contributor.authorDas, Ajoy
dc.contributor.committeememberKrishnamurthy, Diwakar
dc.contributor.committeememberLeung, Henry
dc.date2023-06
dc.date.accessioned2023-01-25T23:41:52Z
dc.date.available2023-01-25T23:41:52Z
dc.date.issued2023-01-23
dc.description.abstractFairness, i.e., lack of bias during a decision-making process is a desirable property in any software system that is used to make critical decisions (e.g., mortgage approval). However, with the rise of Machine Learning (ML) systems, the concern for unfair systems is also growing rapidly as ML systems are inherently difficult to understand and debug. Moreover, datasets that contain various types of biases can be drastic to the users and systems that utilize these datasets. We have already seen evidence of the drastic influence of bias in various cases, ranging from job recruitment to parole approval. As a result, fairness metrics and mitigation approaches are being increasingly necessary to deal with this issue. Given the growing importance of bias detection and mitigation approaches for ML software systems, it is important to learn how bias is detected and mitigated in ML software systems and datasets and how we could assist in the detection and mitigation of such biases using novel toolkits. In this thesis, we explore this topic from two dimensions: (1) First, we qualitatively study how fairness APIs (i.e. software libraries) are used in the wild (i.e., in open-source ML software systems) to detect and mitigate diverse use cases. (2) Second, we develop a suite of toolkits to support the detection and mitigation of labeling inconsistency bias in sentiment analysis datasets for software engineering (SE). A labeling inconsistency arises when two similar sentences in the datasets have different labels, whereas they should ideally have the same labels. Our major observations in this thesis are: (1) Fairness APIs are increasingly being used in diverse real- world use cases, but developers find it challenging to properly use the APIs. (2) Despite having several fairness APIs, we still need new toolkit support besides the fairness APIs to address a bias like labeling inconsistency in sentiment analysis for software engineering (SA4SE) datasets. Our developed toolkits can aid in this task. (3) Our developed toolkits can be adapted to address labeling inconsistency bias problems in any textual datasets that are used to build classification-based ML models.en_US
dc.identifier.citationDas, A. (2023). Detection and mitigation of bias in Machine Learning software and datasets (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.urihttp://hdl.handle.net/1880/115772
dc.identifier.urihttps://dx.doi.org/10.11575/PRISM/40685
dc.language.isoengen_US
dc.publisher.facultySchulich School of Engineeringen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectbiasen_US
dc.subjectfairnessen_US
dc.subjectmachine learningen_US
dc.subjectsoftware fairnessen_US
dc.subjectlabeling inconsistencyen_US
dc.subject.classificationArtificial Intelligenceen_US
dc.subject.classificationComputer Scienceen_US
dc.titleDetection and Mitigation of Bias in Machine Learning Software and Datasetsen_US
dc.typemaster thesisen_US
thesis.degree.disciplineEngineering – Electrical & Computeren_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameMaster of Science (MSc)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2023_das_ajoy.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format
Description:
Main article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: