PRISM :: Browsing by Author "Moshirpour, Mohammad"

Browsing by Author "Moshirpour, Mohammad"

Now showing 1 - 20 of 30

Open Access
A Comprehensive Capacity Expansion Planning Model for Highly Renewable Integrated Power Systems
(2023-05-12) Parvini, Zohreh; Behjat, Laleh; Fapojuwo, Abraham; Moshirpour, Mohammad
Due to the depletion of conventional energy and environmental concerns, the trend toward increasing integration of renewable energy resources brings new challenges to power system planning and operation. The fluctuation of renewable energy resources is the main concern of system planners for their efficient deployment. Incorporating a more precise and detailed model of system constraints is inevitable to deal with these resources' intermittent and volatile nature. However, due to the various aspects and computation complexity of the capacity expansion problem, it is vital to have a thorough understanding of the most affecting constraints on the system planning. The unique characteristics of power systems, along with the integration of renewable energy resources and modern technologies such as energy storage, require developing a profound model for planning future infrastructure based on the available data. The primary objective of this research is to investigate and evaluate various aspects of power systems and develop a comprehensive capacity expansion model utilizing linear optimization techniques. The thesis includes the development of a data set for long-term planning purposes, a co-optimization expansion planning (CEP) model for identifying optimal transmission and generation expansion, modeling of storage technology and reserves, and reducing the network size to ensure model tractability. The framework was designed to facilitate the seamless integration of renewable energy sources and improve the performance of the whole power system, ensuring a smooth transition towards a high-renewable energy future. This tool intends to provide system planners and stakeholders in the generation and transmission sectors insights into future realizations of high-renewable power systems. The model can also be used as a benchmark for future planning studies and adjusted for any possible future assumptions.
Open Access
A Visual Tool for Comparing the Life Cycles of Major Energy Sources in Alberta
(2017) Karbalaei, Amir Hassan; Behjat, Laleh; Gates, Ian; Nowicki, Edwin; Moshirpour, Mohammad; Bergerson, Joule
Air pollution has become one of the major challenges, with which humanity continues to struggle in modern times. The never-ending demand for energy has increased the production and consumption of different types of energy resources. A major part of the energy demand is met through the use of fossil fuels, which is one of the main sources of air pollution. In this thesis, a simple Life Cycle Assessment (LCA) model was created to represent the Greenhouse Gas (GHG) pollution of coal, natural gas and oil sands in Alberta. The model was complemented by developing a user-friendly and interactive visualization tool: The Greener Alberta, using recent software development techniques. The visualization tool enables the general public, especially younger generations, to learn about the industry procedures and to compare GHG emission rates of each energy resource. Finally, research surveys were conducted to verify the effectiveness of the Greener Alberta.
Open Access
An AI-Based Human-Centered Approach to Support Multidisciplinary Requirements Engineering
(2023-01-30) Salmani, Ali; Moshirpour, Mohammad; Duffett-Leger, Linda; Far, Behrouz; Deshpande, Gouri
Multidisciplinary teams are often a necessity for software projects as they provide the required expertise to effectively solve complex problems. However, efficient collaboration between teams with different disciplines is challenging due to several factors including gaps of knowledge areas, establishing a process, and different requirements from various groups of stakeholders. Agile methodologies such as scrum provide a powerful approach to effectively manage software projects through tools and approaches to properly address change which is often more common in multidisciplinary teams. In this study, we will leverage process evaluation tools and techniques to analyze the efficiency of our software development process. We have evaluated this approach based on the project data recorded in Jira and GitHub. This approach is applied to a case study of a virtual healthcare intervention system to measure the team's productivity. Several deficiencies have been identified and discussed based on the results. We conclude that the enumerated deficiencies are related to the requirements engineering (RE) process. To improve the RE process, a set of solutions have been analyzed to determine their feasibility. Automating the requirements engineering process can be an efficient approach to address the aforementioned issues. The main objectives of this thesis is to devise an automated approach to 1) identify the system requirements including the new features and bugs from the users' speech and break them down into tasks, 2) find similar Jira tickets that are already implemented, and 3) estimate the amount of effort needed for the new task. By providing smart and automated support for requirements analysis and elicitation, this solution seamlessly integrates with scrum and is expected to considerably improve the efficiency of the software development process for the virtual intervention system that is used as the case study of this thesis. As part of this thesis, we aim to implement a model to determine whether tasks are similar and a model to estimate the effort required to complete each new task, which is the second and third objectives. For finding the similarities between tasks that relate to objectives 2 and 3 of the thesis, S-BERT, one of the most powerful transformer-based machine learning techniques, was utilized and trained with a dataset that was collected, pre-processed, and normalized. For estimating the required effort of the tasks, we have used an approach that converts original commit instances into a high-dimensional feature space using Kernel-based Principal Component Analysis (KPCA) along with Adversarial Learning (AL). Based on the results, the trained model has improved its ability for topic segmentation and finding similarities between requirements. As well, our model has an accuracy of 86\% when it comes to estimating the required effort.
Open Access
An AI-based Framework For Parent-child Interaction Analysis
(2023-07) Nikbakhtbideh, Behnam; Moshirpour, Mohammad; Duffett-Leger, Linda; Far, Behrouz; Drew, Steve
The quality of parent-child interactions is foundational to children's social-emotional and cognitive development, as well as their lifelong mental health. The Parent-Child Interaction Teaching Scale (PCITS) is a well-established and effective tool used to measure parent-child interaction quality. It is utilized in both public health settings and basic and applied research studies to identify problem areas within parent-child interactions. However, like other observational measures of parent-child interaction quality, the PCITS can be time-consuming to administer and score, which limits its wider implementation. Therefore, the main objective of this research is to organize a framework for the recognition of behavioural symptoms of the child and parent during interventions. Based on the literature on interactive parent-child behaviour analysis, we categorized PCITS labels into three modalities: language, audio, and video. Some labels have dyadic actors, while others have a single actor (either the parent or child). In addition, within each modality, there are technical issues, considerations, and limitations in terms of artificial intelligence. Hence, we divided the problem into three modalities, proposed models for each modality, and a solution to combine them. Firstly, we proposed a model for recognizing action-related labels (video). These labels are interactive and involve two actors: the parent and the child. We conducted a feature extraction algorithm to produce semantic features passed through a feature selection algorithm to extract the most meaningful semantic features from the video. We chose this method due to its lower data requirement compared to other modalities. Also, because of using 2D video files, the proposed feature extraction and selection algorithms are to handle the occlusion and natural conditions like camera movement, Secondly, we proposed a model for recognizing language- and audio-related labels. These labels represent a single-actor role for the parent, as children are not yet capable of producing meaningful text in the intervention videos. To develop this model, we conducted research on a similar dataset to utilize transfer learning between two problems. Therefore, the second part of this research is associated with working on this text dataset. Third, we focused on multi-modal aspects of the work. We conducted experiments to determine how to integrate the prior work into our model. We also provided an ensemble model, which combined the modalities of language and audio based on the semantic and syntactic characteristics of the text. This ensemble model provides a baseline for developing further models with different aspects and modalities. Finally, we provided a roadmap to support more labels that were not covered in this research due to not reaching enough samples. Our proposed framework includes a labelling system that we developed in the primary stages of the research to gather labelled data. This system also plays a role to be integrated with AI modules to provide auto-recognition of the behavioural labels in parent-child interaction videos to the nurses.
Open Access
Detection of Emergent Behaviour in Distributed Software Systems using Data Analysis Techniques
(2021-07-21) Slama, Anja; Far, Behrouz; Uddin, Gias; Moshirpour, Mohammad
Distributed Software Systems (DSS) and their sub-category, Multi-Agent Systems (MAS), are composed of several collaborative components working towards a common goal. Requirement engineering involves the consideration of competing needs and concerns for the proper presentation of the specification of software systems. The analysis of the Scenario-Based Specifications (SBS) has many important advantages to minimize the generation of unexpected behaviours in DSS. These behaviours are known as Emergent Behaviours (EB), and can potentially lead to irreversible, costly damages. In this thesis, we focus on analyzing the software behaviours from the SBS to detect EBs. The verification of the software behaviours in the early stages of software development can detect and prevent unwanted behaviours. In this thesis, different methodologies that aim to detect EBs are discussed. This thesis aims to provide a new automated, homogenous methodology to detect EBs based on their common cause of occurrence. Subsequently, different algorithms were proposed to detect the types of EBs, and examples were presented to explain the algorithms. With the adoption of data analysis techniques, we contribute to preventing these behaviours and ensuring system quality. To evaluate the proposed methodology based on the analysis of the SBS, we used two different approaches: (1) the conventional way by comparing the proposed methodology against related works' methodologies, (2) the dynamic analysis of system traces, which requires the simulation of the SBS to aggregate the system behaviour in runtime. Results show a higher efficiency of the proposed methodology in detecting EBs compared to similar work, which was also proven by the statistical modelling of the case studies' traces. Additionally, we verified the efficiency of the proposed methodology using sequential pattern mining techniques. This thesis contributes to the research of requirement engineering specifically and software engineering generally by providing an automated methodology to analyze the SBS as a black box. It assists the reusability of components and design sustainability. Moreover, the early identification of the cause of EBs empowers the designer and the software development team to handle these EBs and aids in decreasing the system cost.
Open Access
Development and Evaluation of a Framework for Mentor-Based Engineering Outreach
(2022-07-25) Dornian, Katherine; Behjat, Laleh; Moshirpour, Mohammad; Sengupta, Pratim; Jazayeri, Pouyan
Diversity in engineering teams and organizations is needed to solve the complex challenges the world is facing. However, the number of people from historically underrepresented groups—such as women, Indigenous, Black, and Hispanic engineers—falls short of parity with the population [1-4].Mentoring programs successfully improve interest in engineering and perceptions of the field for people in the groups mentioned above [5-8]. Mentor-based outreach is a growing practice that attracts these historically excluded people to study engineering. While frameworks exist for implementing outreach and mentoring programs, there is not yet a framework that informs mentor-based engineering outreach. A solid framework is needed to improve practices and outcomes. In this thesis, I use in-depth analysis of a local mentor-based engineering outreach initiative and review twenty-four external programs to develop a framework for mentor-based engineering outreach. The final framework includes six critical dimensions and eight components to inform design, implementation, and evaluation of programs. This research also shows how structuring mentor-based outreach around technical skill development and relationships encourages positive social and personal development outcomes, such as increasing student interest in engineering. Ultimately, this work provides practitioners and organizations with direction for improving diversity within outreach programs.
Open Access
Development of a CDIO Framework for Teacher Training in Computational Thinking
(2017) Hladik, Stephanie; Behjat, Laleh; Nygren, Anders; Moshirpour, Mohammad; Hugo, Ron; Haque, Anis
Studies have shown that though there is a recent push to include computational thinking and coding in elementary schools, many elementary school teachers have no background in the subject and would require training to effectively teach computational thinking to their students. In this thesis, the development of a framework to train elementary school teachers and students in computational thinking is presented. It is based on a framework for engineering education, and modifies that framework to design and implement creative, cross-curricular activities to teach computational thinking and engineering concepts for students in grades K-6. The activities are also used in a professional development workshop to train teachers in these skills. These activities have positive impacts on perceptions of computational thinking for both elementary school teachers and their students, as evaluated by surveys and interview responses. As well, teachers felt more confident in their ability to implement similar activities in their classrooms.
Open Access
Distributed Denial of Service Attack Detection Using a Machine Learning Approach
(2018-07-30) Gupta, Animesh; Alhajj, Reda; Rokne, Jon; Moshirpour, Mohammad
A distributed denial of service (DDoS) attack is a type of cyber-attack in which the perpetrator aims to deny the services on a network/server by inundating the traffic on the network/server by superfluous requests which renders it incapable to serve requests from legitimate users. According to Corero Network Security (A DDoS protection and mitigation provider), in Q3 2017, organizations around the world experienced an average of 237 DDoS attack attempts per month, which averages to 8 DDoS attacks every day. This was a 35% increase over Q2 that year and a staggering 91% increase over Q1. According to another research by Incapsula, a DDoS attack costs an average of $40,000 per hour to businesses. There are commercially available software which detect and mitigate a DDoS attack, but the high cost of these software makes them hard to afford for small and mid-scale businesses. The proposed work aims to fill this gap by providing real time open-source robust web application for DDoS attack prediction which can be used by small to mid-scale industries to keep their networks and servers secure from malicious DDoS attacks. A Machine Learning approach is used to employ a window-based technique to predict a DDoS attack in a network with a maximum accuracy of 99.83%, if the recommended combination of feature selection and classification algorithm is chosen. The choice of both feature selection and classification algorithm is left to the user. One of the feature selection algorithms is the novel Weighted Ranked Feature Selection(WRFS) algorithm which performs better than other baseline approaches in terms of accuracy of detection and the overhead to build the model. Once the selection is made, the web application connects to the socket and starts capturing and classifying real-time network traffic. After the capture is stopped, information about attack instances (if any), number of attack packets, confusion matrix is rendered to the client using dynamic charts. The trained model used for classifying real-time packets is optimized and uses only enough attributes from the incoming packet which are necessary to successfully predict the class of that packet with high accuracy.
Open Access
Domain Adaptation for Automated Program Repair
(2022-09) Zirak, Armin; Hemmati, Hadi; Pinheiro Bento, Mariana; Moshirpour, Mohammad
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code by an automated tool. APR methods have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE that combine text-to-text transformers with software-specific techniques are outperforming alternatives these days. However, all these methods are limited with respect to Domain Shift. In other words, APR models are trained and tested on the same set of projects (i.e., when fixing a bug from project A, the model has already seen example fixed bugs from project A in the training set). However, in the real world, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that the proposed APR models with high effectiveness, reported in the studies, perform poorly when the characteristics of the new project or its bugs are different (domain shift). In this study, we first define the problem of domain shift in software engineering and automated program repair. Next we measure the potential damage of domain shift on two state-of-the-art APR methods (TFix and CodeXGLUE). Based on this observation, we then propose a domain adaptation framework that can adapt an APR method for a given target project. We conduct an empirical study with three domain adaptation methods and two APR models on 611 bugs from 19 projects. The results show that our proposed framework can improve the effectiveness of TFix by 13.05% and CodeXGLUE by 23.4%, in terms of “Exact Match”. Through experiments, we also show that the framework provides high efficiency and reliability (in terms of exposure bias). Another contribution of this study is the proposal of a data synthesis method to address the lack of labeled data in APR (bugs and their fixes). We leverage transformers to create a bug generator model. We use the generated synthetic data to domain adapt TFix and CodeXGLUE on the projects with no data (Zero-shot learning), which results in an average improvement of 5.76% and 24.42% for TFix and CodeXGLUE, respectively.
Open Access
Effective Control System Framework Selection through Checklist-based Software Quality Evaluation
(2023-09-13) Imani, Alireza; Moshirpour, Mohammad; Belostotski, Leonid; Far, Behrouz H; Drew, Steve H
The Herzberg Astronomy and Astrophysics Research Centre (HAA) of the National Research Council (NRC) is Canada's premier center for astronomy and astrophysics. It maintains the largest and most powerful observatories in Canada and represents Canada at many of the world's leading astronomical events. In the context of my pursuit of a master's degree, a collaborative effort unfolded between HAA and myself, centered around the multifaceted project named ARTTA4. In the realm of control systems, the significance of prioritizing the evaluation of open-source software's quality is undeniable. This emphasis arises from the essential role that a thorough appraisal of these components plays in safeguarding the security, stability, and efficiency of such systems. Neglecting this assessment exposes control systems to a range of vulnerabilities, including bugs and compatibility concerns that could result in operational disruptions, security breaches, and potential risks across diverse industries. Ensuring the integrity and performance of control systems demands a rigorous approach to software quality assessment, serving to preempt unforeseen complications and bolstering the overall reliability and functionality of these systems. Through my engagement with HAA, I recognized the pivotal role of an open-source control system toolkit named Tango Controls in shaping their antenna control system. Consequently, a comprehensive evaluation of Tango Controls' software quality emerged as a vital undertaking for guaranteeing the ultimate dependability and maintainability of the resultant control system. Accordingly, we conducted a generalizable checklist-driven software quality assessment approach to examine Tango Controls. This evaluation brought to light three specific limitations within this open-source toolkit. This finding prompted me to investigate a substitute control system toolkit to replace Tango Control. Thus, we adopted a Component-based Software Development (CBSD) methodology to propose two potential substitute solutions. These alternatives were put into practice through the implementation of a control system module at HAA, in parallel with the utilization of Tango Controls. To quantify their efficacy, we used SonarQube to generate a static source code analysis report. Furthermore, we conducted an empirical comparison centered around the development process spanning all three methodologies. Drawing from empirical and quantitative analyses, it became evident that one of the proposed solutions outperformed Tango Controls in terms of efficacy and performance. In conclusion, this thesis stands as a pivotal stepping stone in the realm of open-source software selection for the development of industrial control systems. As we move forward, the path to fully realizing the potential of open-source technologies lies in sustained research efforts and collaborative endeavors. By delving into the criteria commonly referenced by industry practitioners, we can glean insights that refine the selection process. Furthermore, the introduction of a natural language processing-based tool holds promise in revolutionizing how we approach open-source software comparison and adoption. Such a tool aims to streamline the process by autonomously aggregating pertinent information from diverse online sources. Through this holistic approach, we aspire to foster an environment where open-source technologies are harnessed to their fullest extent, driving the evolution of industrial control systems and propelling technological advancement.
Open Access
Effective Data Analysis Framework for Financial Variable Selection and Missing Data Discovery
(2017) Aghakhani, Sara; Alhajj, Reda; Rokne, Jon; Chang, Philip; Khoshgoftaar, Taghi; Moshirpour, Mohammad
Quantitative evaluation of financial variables plays a foundational role in financial price modeling, economic prediction, risk evaluation, portfolio management, etc. However, the problem suffers from high dimensionality. Thus, financial variables should be selected in a way to reduce the dimensionality of the financial model and make the model more efficient. In addition, it is quite common for financial datasets to contain missing data due to a variety of limitations. Consequently, in practical situations, it is difficult to choose the best subset of financial variables due to the existence of missing values. The two problems are interrelated. Therefore, the central idea in this research is to develop and examine new techniques for financial variable selection based on estimating the missing values, while accounting for all the longitudinal and latitudinal information. This research proposes a novel methodology to minimize the problem associated with missing data and find the best subset of financial variables that could be used for effective analysis. There are two major steps; the first step concentrates on estimating missing data using Bayesian updating and Kriging algorithms. The second step is to find the best subset of financial variables. In this step a novel feature subset selection is proposed (LmRMR) which ranks the financial variables and the best subset of variables is chosen by employing statistical techniques through Super Secondary Target Correlation (SSTC) measurement. Some tests have been done to demonstrate the applicability and effectiveness of the ideas presented in this research. In particular, the potential application of the proposed methods in stock market trading model and stock price forecasting are studied. The experimental studies are conducted on Dow Jones Industrial Average financial variables.
Open Access
An Experimental and Numerical Study on Metakaolin-based Geopolymers
(2021-01-26) Ershad, Mohamadmahdi (Armin); Khoshnazar, Rahil; Moshirpour, Mohammad; Shrive, Nigel Graham
Portland cement concrete, in its most basic form, is produced by mixing aggregate, Portland cement (PC), and water. High energy consumption and CO2 emissions associated with the production of PC have caused serious environmental issues, and PC production is now responsible for about 5-8% of the global CO2-equivalent greenhouse gas emissions. Extensive research has been conducted to develop new and more sustainable binders that can replace PC paste in concrete without compromising its performance. Geopolymers (GPs) are a class of these alternative binders attracting significant attention in the past few decades. They are produced by mixing an aluminosiliceous powder such as metakaolin (MK) with an alkaline solution. Several parameters such as composition and curing conditions influence the characteristics of GPs, and appropriate selection of material proportions is necessary to achieve the desired performance. This research aims to (i) model the compressive strength of MK-based GPs based on their composition and (ii) suggest new methods for enhancing the compressive strength of MK-based GPs. In the first part of this research, different GPs were prepared with two different grades of MK and various compositions. Machine learning models were then used to classify and predict the compressive strength of GPs using the dataset driven from the experimental plan of this research and the literature. Different models were tested, among which the extreme gradient boosting algorithm was able to classify the GPs with 80% accuracy in three levels of ‘low’, ‘medium’, and ‘high’ strength and predict the strength with R2 = 0.80 given the composition and test age. In the second part, a seeding method was used to improve the compressive strength of GPs. Three different types of zeolite seeds (hydrogen faujasite, sodium faujasite, mordenite) were used. The results showed that all the zeolites could improve the strength of GPs although hydrogen faujasite and mordenite seemed to be more effective than sodium faujasite.
Open Access
Explainable AI for Software Engineering: A Systematic Review and an Empirical Study
(2023-01-23) Haji Mohammadkhani, Ahmad; Hemmati, Hadi; Uddin, Gias; Moshirpour, Mohammad
In recent years, leveraging machine learning (ML) techniques has become one of the main solutions to tackle many software engineering (SE) tasks, in research studies (ML4SE). This has been achieved by utilizing state-of-the-art models that tend to be more complex and black-box, such as deep learning, which is led to less explainable solutions. This lack of explainability reduces trust and uptake of ML4SE solutions by professionals in the industry. One potential remedy is to offer explainable AI (XAI) methods to provide the missing explainability. In this thesis, we aim to explore to what extent XAI has been studied in the SE domain (XAI4SE) and provide a comprehensive view of the current state-of-the-art as well as challenges and a roadmap for future work. In order to do so, we conduct a systematic literature review on 24 (out of 869 papers that were selected by keyword search) most relevant published studies in XAI4SE. Our analysis reveals that among the identified studies, software maintenance (\%68) and particularly defect prediction has the highest share on the SE stages and tasks that leverage XAI approaches. We also found that the published XAI methods are mainly applied to more classic ML models (e.g., random forest, decision trees, and regression models), rather than more complex deep learning and generative models (e.g. Transformer code models). Also, our study shows that XAI4SE is mainly used to improve the accuracy or interpretability aspects of the underlying ML models. Furthermore, we noticed a clear lack of standard evaluation metrics for XAI methods in the literature which has caused confusion among researchers and a lack of benchmarks for comparisons. To fill one of the mentioned gaps, we conduct an empirical study on the state-of-the-art Transformer-based models (CodeBERT and GraphCodeBERT) on a set of software engineering downstream tasks: code document generation (CDG), code refinement (CR), and code translation (CT). Initially, we evaluate the validity of the attention mechanism as an explainability method for each particular task. Next, through quantitative and qualitative studies, we show that CodeBERT and GraphCodeBERT learn to put attention to certain token types, depending on the downstream task. Furthermore, we show there are common patterns that cause the model to not work as expected (perform poorly while the problem at hand is easy), such as when there is a long input or when the model fails to pay proper attention to certain token types that are important for that task. Additionally, we suggest recommendations that may alleviate the observed challenges.
Open Access
Exploration of the Effectiveness of Online Learning for Engineering Professional Skills Development
(2017) Lumgair, Brendon; Cowe Falls, Lynne; Achari, Gopal; Moshirpour, Mohammad
Online learning is revolutionizing post-secondary education where class sizes are already in the hundreds. Of the 12 Canadian Engineering Accreditation Board’s (CEAB) graduate attributes and 11 Accreditation Board for Engineering and Technology (ABET) student outcomes, about half are “technical” and half are “professional / soft skills”. Can engineering professional skills learning outcomes be effectively taught and assessed in a large online class without sacrificing the quality of teaching and learning and the rigour of assessment of a traditional in-person class? At the University of Calgary an undergraduate engineering course on professionalism, ethics and life-long learning was taught to 468 students purely online via synchronous webinars / web conferencing, asynchronous videos and a textbook. The average student preferred webinars and rated webinars as the most engaging and effective presentation format. The majority of the students reported that the online course was effective in their achieving the professional skills learning outcomes.
Open Access
Extracting information from Reddit for emergency management - Wildfire case
(2023-12-28) Arvandi, Alireza; Alhajj, Reda; Rokne, Jon; Kawash, Jalal; Moshirpour, Mohammad
The advent of social media has revolutionized the way information is disseminated and consumed during emergency situations, such as wildfires. This study provides an in-depth analysis of public sentiment and communication patterns on Reddit during wildfire events in British Columbia (BC), Canada. Utilizing a comprehensive methodological framework, the research employs data mining techniques, sentiment analysis, and comparative methods to explore the digital discourse surrounding wildfires. The methodology integrates topic mining, keyword extraction, and sentiment analysis to evaluate the nature and scope of discussions within Reddit communities. Subreddit activity is scrutinized to understand regional and national concerns, while sentiment analysis offers insights into the emotional undertones of the discussions. A comparative analysis between Reddit posts and news articles is conducted to assess the interplay between social media narratives and traditional media reporting. The findings reveal a strong regional focus in discussions, reflecting the direct impact of wildfires on local communities. National concern is also evident, with broader societal implications being discussed in both general and niche subreddits. Temporal analysis of subreddit activity indicates that engagement is predominantly event-driven, with implications for emergency services, content creators, and community managers. This research contributes to the understanding of social media’s role in crisis communication and public sentiment analysis. It highlights the potential of platforms like Reddit to serve as real-time barometers of public concern and provides actionable insights for stakeholders involved in crisis management and communication. The study’s methodologies and insights have broader applications, offering a template for analyzing online discourse in response to various emergencies.
Open Access
Fuzzy Logic Classification in Review Spam Detection
(2019-05-21) Rachdi, Btissam; Rokne, Jon G.; Alhajj, Reda S.; Moshirpour, Mohammad
With the recent popularity of e-commerce, customers publish reviews about the products or services they purchased or utilized and these reviews in turn serve as the means for the potential customers to make a better choice based on the experiences of others. These pieces of opinion information are not only important for individual users but also benefit the business organizations, as they can monitor the customers’ opinions, and accordingly adjust their business strategies. However, many of the reviewing systems exploit this motivation for some people to enter their fake reviews to promote some products or defame some others. Hence, in recent years, review analysis has gained a lot of importance and by using opinion-mining detection; I could locate and eliminate potential spam reviews. In this thesis, I have introduced fuzzy logic in the review spam detection and combined two others data mining techniques, periodicity of frequent pattern and the outlier detection to study the behavior of the reviewer towards the reviewed product and classify the users using the fuzzy logic classification model. Thus, the proposed analysis have been proposed and examined over a sample of dataset.
Open Access
Intelligent Data Analysis for Early Warning: From Multiple Sources to Multiple Perspectives
(2019-09-12) Afra, Salim; Alhajj, Reda S.; Moussavi, Mahmood; Alhajj, Reda; Rokne, Jon G.; Moshirpour, Mohammad; Tavli, Bülent
Misusing and benefiting from the development in technology for communication, criminal and terror groups have recently expanded and spread into global organizations and activities. Fortunately, it is possible to benefit from the technology to fight against terror and criminal groups by tracing, identifying, surrendering, and preventing them from executing their bloodily plans. Indeed, it is very affordable to capture various kinds of data which could be analyzed to predict potential criminals and terrorists. Data comes in various formats from text to images, and may become available incrementally due to dynamic sources. This leads to what has been recently classified as big data which has attracted considerable attention from the industry and the research community. Researchers and developers involved in this domain are trying to adapt and integrate existing techniques into customized solutions which could successfully and effectively handle big data with all its distinguishing characteristics. Alternatively, tremendous effort has been invested in developing new techniques to cope with big data for situations where existing techniques neither individually nor as an integrated group could address the shortcomings in this domain. Realizing the need for effective solutions capable of dealing with criminal and terror groups could be mentioned as the main motivation to undertake the study described in this thesis. The main contribution of this thesis is an early warning system that uses different sources of data to identify potential criminals and terrorists (hereafter both criminals and terrorists will be meant when any of them is mentioned in the text). The process works as follows. Criminal profiles are analyzed and their corresponding criminal networks are derived. This automates and facilitates the work of crime analysts in predicting events that may lead to disaster. We used face images as a data source and performed different studies to determine the accuracy and effectiveness of current face recognition and clustering algorithms in identifying people in uncontrolled environments, which are actually the environments encountered in real situations when dealing with criminals and terrorists. We trained our own face recognition algorithm using convolutional neural networks (CNN) by pre-processing the input images for better recognition rates. We showed how this is more effective than frontalized profile face images. We designed a queuing system for surveillance camera monitoring to raise an alarm when unknown people who pass through a monitored area turn into potential suspects. We also integrated different data sources such as social media, news, and official criminal documents to extract criminal names. We then generate a criminal profile which includes the activities that a given criminal is involved in. We also linked criminals together to build a criminal network by expanding the coverage and analyzing the collected data. We then proposed several unique criminal network analysis techniques to provide better understanding and knowledge for crime analysts. To achieve this, we added more functions related to criminal network analysis to NetDriller which is a powerful social network analysis tool developed by our research group. We also designed an algorithm for link prediction which better detects if a link between two nodes will exist in the future. All these functionalities have been well integrated into the monitoring system which has been developed and well tested to demonstrate its applicability and effectiveness.
Open Access
Intelligent Medical Image Analysis for Quality Assurance, Teaching and Evaluation
(2020-06-23) Aksac, Alper; Alhajj, Reda; Demetrick, Douglas James; Rokne, Jon G.; Moshirpour, Mohammad; Karray, Fakhreddine O.
Manually spotting and annotating the affected area(s) on histopathological images with high accuracy is regarded as the gold standard in cancer diagnosis and grading. However, this is a time-consuming and tedious task that requires considerable effort, expertise and experience of a pathologist. These are gained over time by analyzing more cases. Whereas this visual interpretation has strict guidelines. This brings a certain subjectivity to the histological analysis, and therefore, leads to inter/intra-observer variability and some reproducibility issues. Besides, these issues may have a direct effect on patient prognosis and treatment plan. These problems can be alleviated by developing automated image analysis tools for digitized histopathology. Thanks to the rapid development in the image capturing and analysis technology which could be employed to not only give more insight to pathologists, but also guide them in detecting and grading diseases. These quantitative computational tools aim to improve the quality of pathology researchers in terms of speed and accuracy. Thus, it is very important to develop an automatic assessment tool for quantitative and qualitative analysis to help remove this drawback. The main contribution of this thesis is an intelligent system for quality assurance, teaching and evaluation applications in anatomical pathology. We present a spatial clustering algorithm, named CutESC (Cut-Edge for Spatial Clustering) with a graph-based approach. CutESC performs clustering automatically for complicated shapes and different density without requiring any prior information and parameters. We have developed an automatic cell nuclei detection method where the proposed solution uses the traditional CNN learning scheme solely to detect nuclei, and then applies single-pass voting with spatial clustering explicitly to detect them. We also propose an automated method to identify and locate the mitotic cells, and tubules in histopathology images using deep neural network frameworks. We present a dataset of breast cancer histopathology images named BreCaHAD which is publicly available to the biomedical imaging community. Moreover, we propose an efficient method for salient region detection. Finally, we introduce a new tool called CACTUS (Cancer Image Annotating, Calibrating, Testing, Understanding and Sharing) which is proposed to help and guide pathologists in their effort to improve disease diagnosis and thereby reduce their workload and bias among them. CACTUS can be useful for both disseminating anatomical pathology images for teaching, as well as for evaluating agreement amongst pathologists or against a gold standard for evaluation or quality assurance.
Embargo
Interpretable Deep Learning Models for Wearable Data in Sleep and Stress Analysis: Bridging the Gap between Predictive Accuracy and Explainability in Personalized Health Monitoring
(2024-01-26) Barati, Ronak; Moshirpour, Mohammad; Duffett-Leger, Linda; Moshirpour, Mohammad; Duffett-Leger, Linda; Barcomb, Ann; Sameet Deshpande, Gouri
This study integrates wearable technology, machine learning, and personal health to analyze human sleep patterns and stress levels. It aims to understand the impact of daily activities and physiological metrics on individual well-being, utilizing a broad data set from various individuals. The research compiles three interrelated studies, offering a detailed view of personalized health monitoring and its potential for future applications. The first study utilizes LSTM networks, as well as RNN, complemented by Explainable AI, particularly LIME. This approach provides a deep dive into the rich, extensive data gathered from smartwatches, revealing how our daily routines—our steps, heart rates, stress, and physical activities—influence the sleep duration of our different levels of sleep Through this in-depth analysis, not only are we able to uncover the subtle but significant ways in which our lives influence our sleep, but the data allows us to develop tailored health interventions specific to everyone. The second study makes use of data from wearable devices to classify sleep levels using seven machine-learning models. Throughout this journey, stress plays a pivotal role in affecting sleep quality. The comparison of models with and without stress data suggests a compelling case for holistic health monitoring. An important finding of models that incorporate stress data is that psychological factors play a significant role in understanding and improving sleep health. The implications of this insight have a significant supporting on the development of wearable technologies and health monitoring systems, advancing our understanding of sleep disorders and treating them. In our final study, smartwatch data from first responders and their families were analyzed over three years using machine learning classifiers like SVM, Logistic Regression, KNN, Decision Trees, Random Forests, Naive Bayes, and XGBoost. The comparison between datasets with and without sleep data showed that sleep inclusion significantly boosts stress prediction accuracy to 98%, underlining the relationship between sleep and stress. This research offers vital stress management insights, especially for first responders.
Open Access
A Large Scale Agile Teaching Framework for Software Engineering
(2022-12-19) Bahrehvar, Majid; Moshirpour, Mohammad; Far, Behrouz; Johnston, Kimberly
There has been a great deal of interest in software engineering as a rewarding career in recent years as industry demands for software professionals continues to rise. As such enrollments in tech-related majors such as software engineering and computer science continues to increase. There are several sources available for learning software engineering including Massive Online Open Courses (MOOCs). Meanwhile, universities are the primary providers of high-quality instruction in this field. Universities have to accept many students, which has created many challenges, such as reducing the quality of education and difficulty managing classes by instructors and assistants. Universities also need to increase their faculty members and improve the educational infrastructure. The industry is changing rapidly and demands graduates to adapt to the needs of the industry as quickly as possible. In addition, they are expected to have some soft skills, such as critical thinking and teamwork, that make university training harder. Various methods have been developed for software engineering education to manage the challenges of large enrollments and providing hands-on learning. These methods are based on active learning, which focuses on the learner rather than the educator, and require more work from instructors. This thesis provides a framework for teaching software engineering (SE) that utilizes DevOps concepts in teaching to respond to the needs of universities, based on agile methodologies and project-based learning that have matured in the industry and educational field after many years. We used machine learning and ML4Code methods to address the challenges of providing scalable feedback in universities, which is an essential need for a practical discipline such as software engineering. During the winter of 2021, this framework was implemented in ENSF 607 - Advanced Software Development and Architecture at the University of Calgary. It was evaluated based on the students’ perceptions of its impact on their learning journey.