Iterative Open Area Detection in Feature Spaces Using World Constraints

dc.contributor.advisorDenzinger, Joerg
dc.contributor.authorDouglas, Nathan Josiah
dc.contributor.committeememberJacob, Christian
dc.contributor.committeememberWu, Leanne
dc.date2024-07
dc.date.accessioned2024-07-05T21:08:22Z
dc.date.available2024-07-05T21:08:22Z
dc.date.issued2024-07-03
dc.description.abstractImproving data quality is a very valuable step of any data-driven process. One method of evaluating data quality that is understudied is the detection of Open Areas. An Open Area indicates a discrepancy between what information exists in a dataset, and what information should exist in that dataset as determined by world constraints (i.e. constraints informed by the dataset’s domain). We propose the Iterative Open Area Detection (IOAD) approach as a method to detect these problematic areas. IOAD is an iterative process which successively finds Open Areas in a worst-first fashion. Each iterative step first searches the feature space for the least represented point, and then finds the largest surrounding area such that the entire sphere does not meet some criteria for reasonable representation. We evaluate this approach on a variation of an existing dataset which we extend to ensure compatibility with the relevant world constraints, and also on a series of synthetic datasets of varying dimensionality. First, we show that the areas we find are both interesting and appropriate by analyzing both the candidate points and the surrounding areas in the context of the world constraints. Then, to further validate the areas reported by IOAD we artificially introduce Open Areas by removing some cluster of points, and then measure the number of iterations needed to detect the removed area. Our results show that our approach is always successful, but the efficiency may be bound by the Curse of Dimensionality.
dc.identifier.citationDouglas, N. J. (2024). Iterative Open Area Detection in feature spaces using world constraints (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/119131
dc.language.isoen
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subject.classificationComputer Science
dc.subject.classificationArtificial Intelligence
dc.titleIterative Open Area Detection in Feature Spaces Using World Constraints
dc.typemaster thesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2024_douglas_nathan.pdf
Size:
25.46 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: