Exploration of Techniques for Working with Sparse Data when Applying Natural Language Processing to Assist a Qualitative Data Analysis of a COVID-19 Open Innovation Community

Barcomb, AnnYamani, Shirin2024-04-182024-04-182024-04-17Yamani, Shirin (2024). Exploration of techniques for working with sparse data when applying natural language processing to assist a qualitative data analysis of a COVID-19 open innovation community (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.https://hdl.handle.net/1880/11843410.11575/PRISM/43276This thesis undertakes a novel integration of Natural Language Processing (NLP) with Qualitative Data Analysis (QDA) to investigate the dynamics of volunteer involvement within the TeamOSV community, a collective formed in response to the COVID-19 pandemic. Central to this study is the exploration of roles and interaction patterns among episodic and habitual volunteers, alongside an analysis of the factors influencing their engagement and disengagement within the community. A significant methodological contribution of this work lies in addressing the sparse data challenge, a common constraint in qualitative research, particularly within multi-class classification contexts. The study employs and critically evaluates a range of NLP techniques, with a focus on data augmentation strategies, to enhance the efficacy of various models, including Logistic Regression, Naive Bayes, Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and particularly the Self-Attention model. The proposed framework, identified for its superior performance, demonstrates a noteworthy ability to process and interpret sparse qualitative data, surpassing both traditional approaches in its effectiveness. Furthermore, the thesis explores an in-depth analysis of model variations, assessing the impact of differing configurations of Self-Attention blocks and layers of feed-forward neural networks. It also explores the implications of pre-training on model performance, offering insights into the architectural complexities and training dynamics of NLP models. A crucial aspect of this exploration is the consideration of the trade-offs between model complexity and computational efficiency, highlighting the practical challenges and considerations in deploying these models in qualitative research contexts. Qualitatively, the study offers a detailed examination of volunteer roles within the TeamOSV community. It identifies the distinct contributions and challenges associated with episodic volunteers, characterized by their sporadic engagement patterns, and habitual volunteers, who provide stability and long-termvision. The research also sheds light on the reasons behind volunteer disengagement, such as lifestyle changes and diminishing interest, providing a holistic understanding of volunteer participation in open-source, community-driven projects. The thesis concludes by emphasizing the collaborative strengths of merging NLP with QDA, a union that significantly augments the depth of qualitative research. It proposes a roadmap for future investigations, concentrating on enhancing insights into volunteer coordination within open innovation settings and broadening the application range of NLP in qualitative data examination.enUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Natural Language ProcessingArtificial IntelligenceExploration of Techniques for Working with Sparse Data when Applying Natural Language Processing to Assist a Qualitative Data Analysis of a COVID-19 Open Innovation Communitymaster thesis