Ghaderi, MajidAbhari, Bardia2024-01-192024-01-192024-01-18Abhari, B. (2024). Intrusion detection using heterogeneous data sources (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.https://hdl.handle.net/1880/118009https://doi.org/10.11575/PRISM/42853Amidst the growing sophistication of cyber-attacks and malware, conventional Intrusion Detection Systems (IDS) often fall short, primarily due to their reliance on single data sources, such as Network-based (NIDS) or Host-based Intrusion Detection Systems (HIDS). These systems tend to miss a comprehensive view of network activities, as highlighted in existing literature. Recent research efforts have attempted to integrate multiple heterogeneous data sources, yet often treat each data source in isolation, thereby overlooking the complex interrelations that exist among various data sources within the same network. This thesis introduces IMD-IDS, which stands apart by its ability to fuse multiple heterogeneous data sources effectively for anomaly detection. The centrepiece of IMD-IDS is a machine learning (ML) based detection engine trained concurrently on all available data sources, whether heterogeneous or not. This approach enables IMD-IDS to uncover and understand the intricate relationships between different data sources. To achiece this, a novel fusion algorithm is presented, leveraging BERT encoders to convert textual host data into numerical vectors. These vectors are then integrated with feature vectors derived from network data, forming a rich, combined dataset. The XGBoost model, employed within IMD-IDS, utilizes this unified dataset to enhance anomaly detection accuracy, benefiting from simultaneous access to diverse data sources. Through experimental validation, this thesis demonstrates that IMD-IDS achieves superior performance compared to previous multi-datasource IDS approaches, particularly in detecting both known and zero-day attacks. The results show an average performance improvement of 12\% and 10\%, respectively, for these attack types.enUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Network SecurityIntrusion DetectionWord EmbeddingApplied Machine LearningMulti Data SourceEducation--SciencesIntrusion Detection Using Heterogeneous Data Sourcesmaster thesis