Iterative Open Area Detection in Feature Spaces Using World Constraints

Date
2024-07-03
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Improving data quality is a very valuable step of any data-driven process. One method of evaluating data quality that is understudied is the detection of Open Areas. An Open Area indicates a discrepancy between what information exists in a dataset, and what information should exist in that dataset as determined by world constraints (i.e. constraints informed by the dataset’s domain). We propose the Iterative Open Area Detection (IOAD) approach as a method to detect these problematic areas. IOAD is an iterative process which successively finds Open Areas in a worst-first fashion. Each iterative step first searches the feature space for the least represented point, and then finds the largest surrounding area such that the entire sphere does not meet some criteria for reasonable representation. We evaluate this approach on a variation of an existing dataset which we extend to ensure compatibility with the relevant world constraints, and also on a series of synthetic datasets of varying dimensionality. First, we show that the areas we find are both interesting and appropriate by analyzing both the candidate points and the surrounding areas in the context of the world constraints. Then, to further validate the areas reported by IOAD we artificially introduce Open Areas by removing some cluster of points, and then measure the number of iterations needed to detect the removed area. Our results show that our approach is always successful, but the efficiency may be bound by the Curse of Dimensionality.
Description
Keywords
Citation
Douglas, N. J. (2024). Iterative Open Area Detection in feature spaces using world constraints (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.