Browsing by Author "Rokne, Jon"
Now showing 1 - 20 of 26
Results Per Page
Sort Options
- ItemOpen AccessA Graph Based Approach for Making Recommendations Based on Multiple Data Sources(2015-05-27) Dhaliwal, Sukhpreet; Alhajj, Reda; Rokne, JonRecommendation system is an information filtering system that predicts customer preferences. Customer preferences are extracted through analyzing the behaviour patterns of customers from multiple data sources. Graph-based models play an important role in recommendation systems to extract the customer preferences from multiple data sources. However, graph-based models have been rarely used in traditional recommendation systems. The main objective of this thesis is to use a graph-based recommender system that uses multiple data sources. A graph-based hybrid recommender model is developed to integrate content-based, collaborative filtering and association rule mining techniques. Moreover, the PageRank algorithm is used to produce a ranked list of recommendation. Our analysis on a Retail store dataset shows the impact of using multiple data sources on the accuracy of a recommender system while handling the sparsity problem. Usage of demographic information of customers remedies the cold start problem. Grouping the products based on product type produced better results and it also showed the impact of using the different level of product taxonomy. Additionally, assembling content-based, collaborative filtering and association rule mining also showed many improvements in results. Moreover, indirect connections improve the coverage of our recommender system.
- ItemOpen AccessAutomatic Inspection of Radio Astronomical Surveys (AIRAS)(2016) Said, Dina Adel; Barker, Kenneth Edwin; Stil, Jereon Maarten; Fiege, Jason; Rokne, Jon; Denzinger, Jörg; Leahy, DenisThis research investigates the problem of analyzing radio astronomical surveys (RAS) to automatically identify groups of objects forming patterns that astronomers are interested to find. The visual inspection of RAS to find these interesting patterns requires a lot of time and effort to go through thousands of images in RAS. Moreover, the visual process can be infeasible in very crowded and noisy images. To tackle this problem, this research presents AIRAS: the first reported system for the automatic inspection of RAS. AIRAS consists of two main stages; (i) STAGE 1: Object finding where all objects in RAS are found and presented in a graph-based representation called the astronomy graph (AG), and (ii) STAGE 2: Pattern querying and retrieval where astronomers specify the characteristics of interesting patterns in a query form. Afterwards, AIRAS finds patterns matching these characteristics in the AG and presents them to astronomers for further investigation. Astronomers can use AIRAS to detect patterns known to be suspicious (i.e. they consist of false astronomical objects or artifacts). Among these patterns are the hexagonal pattern (HP) and the zigzag pattern (ZP). In the HP, objects form a hexagon shape with an object in the middle, similar to the shape of the front end of the Arecibo telescope horn. In the ZP, objects are aligned in an orientation with the horizontal axis similar to the scanning line of the radio telescope. These two patterns are used as case studies to evaluate AIRAS performance using images from the GALFACTS project; a project carried out at the University of Calgary in cooperation with several research institutes worldwide. The experimental studies show that AIRAS is a promising system that finds patterns in RAS in response to astronomers’ queries with an acceptable accuracy. Additionally, AIRAS can be extended to connect the patterns found with their physical signals to provide more insights about the nature of these patterns.
- ItemMetadata onlyCALCULATION OF EXTREMUM PROBLEMS FOR UNIVALENT FUNCTIONS(1977-09-01) Grassmann, Eckhard; Rokne, JonLet $S$ be the usual class of univalent functions in {${|z|~<~1}$} normalized by $f(z)~=~z~+$ $sum from i=2 to {inf}$ $a sub i z sup i$ and $V sub n$ the coefficient region of $S$. It is well known that $f$ corresponds to a boundary point of $V sub n$ if and only if $f$ satisfies a quadratic equation of the form $Q(w)dw sup 2~=~R(z)dz sup 2$ called Schiffers equation that maps {${|z|~<~1}$} onto a slit domain. We treat the following problems numerically for $V sub 4$: 1. Given $Q$ find $R$ and $f$. 2. Find the function that maximizes $Re~e sup i sup phi$ $a sub 4$ with theconstraint that $a sub 2$ and $a sub 3$ are some given complex numbers in $V sub 3$. In this case Schiffers equation is a sufficient condition for $f$ to be extremal. The critical trajectories of $Q(w)dw sup 2$ and $R(z)dz sup 2$ are in each case displayed graphically for some particular examples.
- ItemOpen AccessData Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and All Repeated Patterns Detection in Discrete Sequences(2017) Xylogiannopoulos, Konstantinos; Alhajj, Reda; Rokne, Jon; Pardalos, Panayote; Kawash, Jalal; Helaoui, MohamedMy research work of the current thesis focuses on the detection of single, multiple and all repeated patterns in sequences. Many algorithms exist for single pattern detection that take an input argument (i.e., pattern to be detected) and produce as outcome the position(s) where the pattern exists. However, to the best of my knowledge, there is nothing in literature related to all repeated patterns detection, i.e., the detection of every pattern that occurs at least twice in one or more sequences. This is a very important problem in science because the outcome can be used for various practical applications, e.g., forecasting purposes in weather analysis or finance by detecting patterns having periodicity. The main problem of detecting all repeated patterns is that all data structures used in computer science are incapable of scaling well for such purposes due to their space and time complexity. In order to analyze sequences of Megabytes the space capacity required to construct the data structure and execute the algorithm can be of Terabyte magnitude. In order to overcome such problems, my research has focused on simultaneous optimization of space and time complexity by introducing a new data structure (LERP-RSA) while the mathematical foundation that guarantees its correctness and validity has also been built and proved. A unique, innovative algorithm (ARPaD), which takes advantage of the exceptional characteristics of the introduced data structure and allows big data mining with space and time optimization, has also been created. Additionally, algorithms for single (SPaD) and multiple (MPaD) pattern detection have been created, based on the LERP-RSA, which outperform any other known algorithm for pattern detection in terms of efficiency and usage of minimal resources. The combination of the innovative data structure and algorithm permits the analysis of any sequence of enormous size, greater than a trillion characters, in realistic time using conventional hardware. Moreover, several methodologies and applications have been developed to provide solutions for many important problems in diverse scientific and commercial fields such as Finance, Event and Time Series, Bioinformatics, Marketing, Business, Clickstream Analysis, Data stream Analysis, Image Analysis, Network Security and Mathematics.
- ItemOpen AccessDistributed Denial of Service Attack Detection Using a Machine Learning Approach(2018-07-30) Gupta, Animesh; Alhajj, Reda; Rokne, Jon; Moshirpour, MohammadA distributed denial of service (DDoS) attack is a type of cyber-attack in which the perpetrator aims to deny the services on a network/server by inundating the traffic on the network/server by superfluous requests which renders it incapable to serve requests from legitimate users. According to Corero Network Security (A DDoS protection and mitigation provider), in Q3 2017, organizations around the world experienced an average of 237 DDoS attack attempts per month, which averages to 8 DDoS attacks every day. This was a 35% increase over Q2 that year and a staggering 91% increase over Q1. According to another research by Incapsula, a DDoS attack costs an average of $40,000 per hour to businesses. There are commercially available software which detect and mitigate a DDoS attack, but the high cost of these software makes them hard to afford for small and mid-scale businesses. The proposed work aims to fill this gap by providing real time open-source robust web application for DDoS attack prediction which can be used by small to mid-scale industries to keep their networks and servers secure from malicious DDoS attacks. A Machine Learning approach is used to employ a window-based technique to predict a DDoS attack in a network with a maximum accuracy of 99.83%, if the recommended combination of feature selection and classification algorithm is chosen. The choice of both feature selection and classification algorithm is left to the user. One of the feature selection algorithms is the novel Weighted Ranked Feature Selection(WRFS) algorithm which performs better than other baseline approaches in terms of accuracy of detection and the overhead to build the model. Once the selection is made, the web application connects to the socket and starts capturing and classifying real-time network traffic. After the capture is stopped, information about attack instances (if any), number of attack packets, confusion matrix is rendered to the client using dynamic charts. The trained model used for classifying real-time packets is optimized and uses only enough attributes from the incoming packet which are necessary to successfully predict the class of that packet with high accuracy.
- ItemOpen AccessEffective Data Analysis Framework for Financial Variable Selection and Missing Data Discovery(2017) Aghakhani, Sara; Alhajj, Reda; Rokne, Jon; Chang, Philip; Khoshgoftaar, Taghi; Moshirpour, MohammadQuantitative evaluation of financial variables plays a foundational role in financial price modeling, economic prediction, risk evaluation, portfolio management, etc. However, the problem suffers from high dimensionality. Thus, financial variables should be selected in a way to reduce the dimensionality of the financial model and make the model more efficient. In addition, it is quite common for financial datasets to contain missing data due to a variety of limitations. Consequently, in practical situations, it is difficult to choose the best subset of financial variables due to the existence of missing values. The two problems are interrelated. Therefore, the central idea in this research is to develop and examine new techniques for financial variable selection based on estimating the missing values, while accounting for all the longitudinal and latitudinal information. This research proposes a novel methodology to minimize the problem associated with missing data and find the best subset of financial variables that could be used for effective analysis. There are two major steps; the first step concentrates on estimating missing data using Bayesian updating and Kriging algorithms. The second step is to find the best subset of financial variables. In this step a novel feature subset selection is proposed (LmRMR) which ranks the financial variables and the best subset of variables is chosen by employing statistical techniques through Super Secondary Target Correlation (SSTC) measurement. Some tests have been done to demonstrate the applicability and effectiveness of the ideas presented in this research. In particular, the potential application of the proposed methods in stock market trading model and stock price forecasting are studied. The experimental studies are conducted on Dow Jones Industrial Average financial variables.
- ItemOpen AccessEfficient Algorithms for Realistic Lens Effect Simulation(2014-01-07) Liu, Xin; Rokne, JonLens effects play an important role in the realism and aesthetics of graphical rendering. Although a wide range of lens effect simulation algorithms have been proposed in computer graphics, we found that realistic lens effect simulation is still very costly in terms of computing time. In this thesis, we have developed efficient algorithms for rendering images with realistic lens effects. We first strived to improve the efficiency of the physically correct distributed ray tracing algorithm by speeding up the fundamental ray-object intersection operations on a GPU. This work brought forth a new acceleration structure built on the top of a uniform grid, called the micro 64-tree, and a new grid traversal algorithm based on the micro 64-tree. The micro 64-tree speeds up the distributed ray tracing by an order of magnitude compared to algorithms based on the uniform grid. However, it does not improve the computational complexity of the distributed ray tracing algorithm, which is proportional to the number of sampling rays per pixel. A main benefit of distributed ray tracing is that it uses correct visibility and thus can render partial occlusions properly. Observing that the visibility of a lens can be mostly covered by a few representative views, we then proposed a new algorithm that synthesizes lens effects from sparse views. The sparse-view based algorithm can produce high quality lens effects close to the result of distributed ray tracing, but at a much higher speed. Realizing the fact that synthesizing realistic lens effects in 2D space is also computationally expensive, we finally proposed a novel lens effect simulation method based on a physical lens, which “calculates” the complicated optics with the instant physical process of lens imaging. Although the algorithm is still not mature partially because of hardware limitations, it is the first attempt in computer graphics that inserts a physical lens into the graphical rendering pipeline. The physical lens based algorithm can synthesize images incorporating various lens effects with a very low computational complexity.
- ItemOpen AccessEmotion and Sentiment Analysis from Twitter Text(2018-07-27) Sailunaz, Kashfia; Elhajj, Reda S; Elhajj, Reda; Rokne, Jon; Krishnamurthy, DiwakarOnline social networks have emerged as new platform that provide people an arena to share their views and perspectives on different issues and subjects with their friends, family, and other users. We can share our thoughts, mental states, moments and stances on specific social, and political issues through texts, photos, audio/video messages and posts. Indeed, despite the availability of other forms of communication, text is still one of the most common ways of communication in a social network. Twitter was chosen in this research for data collection, experimentation and analysis. The research described in this thesis is to detect and analyze both sentiment and emotion expressed by people through texts in their Twitter posts. Tweets and replies on few recent topics were collected and a dataset was created with text, user, emotion and sentiment information. The customized dataset had user detail like user ID, user name, user's screen name, location, number of tweets/followers/likes/followees. Similarly, for textual information, tweet ID, tweet time, number of likes/replies/retweets, tweet text, reply text and few other text based data were collected. The texts of the dataset were then annotated with proper emotions and sentiments according to some benchmark models. The customized dataset was then used to detect sentiment and emotion from tweets and their replies using machine learning. The influence scores of users were also calculated based on various user-based and tweet-based parameters. Based on those information, both generalized and personalized recommendations were offered for users based on their Twitter activities.
- ItemOpen AccessEntity Linking with Convolutional Neural Network(2016) Xu, Shunyi; Alhajj, Reda; Rokne, Jon; Fapojuwo, AbrahamEntities are real world objects such as persons, places, or events that appear in natural language text such as web pages, news, and journals. Entity Linking, a nascent field in Natural Language Processing, is the task of linking entities in text to their referent entries in a Knowledge Base (KB), which is a repository of information such as Wikipedia. There’s a huge application of entity linking in automatic knowledge base population, prevention of identity crimes, etc. It can also provide background information about unfamiliar concepts during document reading, rendering a smooth and joyful reading experience without frequent “context switch”. This thesis taps into the power of convolutional neural network, and proposes an architecture that makes use of deep learning layers, convolution, max pooling, and fully-connected neurons with dropout to approach the problem of entity linking. Based on a pre-trained word2vec word embedding and another ad-hoc trained layer of word representation, we were able to outperform previous state-of-art models, which handcrafted a large number of features, by a modest margin. Visualization of the neural network is also provided in order to understand what happens under the hood. Our experiment showed that it clearly captured the desired features, indicating the efficacy of neural network in dealing with entity linking.
- ItemOpen AccessExtracting information from Reddit for emergency management - Wildfire case(2023-12-28) Arvandi, Alireza; Alhajj, Reda; Rokne, Jon; Kawash, Jalal; Moshirpour, MohammadThe advent of social media has revolutionized the way information is disseminated and consumed during emergency situations, such as wildfires. This study provides an in-depth analysis of public sentiment and communication patterns on Reddit during wildfire events in British Columbia (BC), Canada. Utilizing a comprehensive methodological framework, the research employs data mining techniques, sentiment analysis, and comparative methods to explore the digital discourse surrounding wildfires. The methodology integrates topic mining, keyword extraction, and sentiment analysis to evaluate the nature and scope of discussions within Reddit communities. Subreddit activity is scrutinized to understand regional and national concerns, while sentiment analysis offers insights into the emotional undertones of the discussions. A comparative analysis between Reddit posts and news articles is conducted to assess the interplay between social media narratives and traditional media reporting. The findings reveal a strong regional focus in discussions, reflecting the direct impact of wildfires on local communities. National concern is also evident, with broader societal implications being discussed in both general and niche subreddits. Temporal analysis of subreddit activity indicates that engagement is predominantly event-driven, with implications for emergency services, content creators, and community managers. This research contributes to the understanding of social media’s role in crisis communication and public sentiment analysis. It highlights the potential of platforms like Reddit to serve as real-time barometers of public concern and provides actionable insights for stakeholders involved in crisis management and communication. The study’s methodologies and insights have broader applications, offering a template for analyzing online discourse in response to various emergencies.
- ItemOpen AccessTHE FDDI PILOT PROJECT IN COMPUTER SCIENCE AT THE UNIVERSITY OF CALGARY(1991-03-01) Hankinson, David; Rokne, Jon; Snowcroft, Brian; MacDonald, BruceContinued demands on the Computer Science Department's resources forced an unsatisfactory network topology. Sun Microsystem's implementation of FDDI was selected to replace the backbone, and is the first such installation in Western Canada. The paper reviews the hardware, software and topology of FDDI systems, explains the configuration chosen for the department, and discusses the installation. Each ring node requires a VME bus FDDI controller board. Physically the ring is star connected, providing convenient patch panel reconfiguration for maintenance and testing. All department file/client servers were attached to the ring, while groups of client workstations are controlled over sub-Ethernets. The resulting topology is a central backbone ring, with sub-Ethernets radiating from each FDDI node. FDDI's simplicity enables performance improvements in addition to the increased speed, so long as the network topology and systems configuration are carefully designed. Reliability is achieved by duplication of file systems on all ring nodes, so that any subnetwork may operate independently of the ring. We conclude by recommending that machine crashes should cause the FDDI board to pass through the data, rather than wrapping the dual rings and possibly causing unnecessary ring fragmentation. The paper discusses the relative merits of concentrators and optical bypass switches for fragmentation protection.
- ItemOpen AccessFlexible and Scalable Routing Approach for Mobile Ad Hoc Networks by Function Approximation of Q-Learning(2016) Elzohbi, Mohamad; Alhajj, Reda; Rokne, Jon; Kawash, Jalal; Helaoui, MohamedWireless mobile devices are rapidly spreading to the extent that it is hard to find a person not exposed to such technology. These devices could be connected directly or indirectly by wireless channels to form a mobile ad hoc network (MANET). Finding a route for flow from a source to a destination in a network is known as routing. Dynamic topology and unstable link states are the main problems facing routing in MANETs. This thesis employs reinforcement learning, namely Q-learning to develop a routing mechanism. Features inspired from the network are used in approximating the Q-function to form a new intelligent routing metric. This way, the routing process concentrates on specific routes instead of network-wide broadcasting. Accordingly, it is possible to achieve flexibility and scalability in routing. Advantages of the proposed routing technique have been highlighted by conducting experiments in two MANETs environments, namely hand-held devices based MANETs and VANETs.
- ItemOpen Access
- ItemOpen AccessIntegrating Deep Learning and Image Processing Techniques into a Hybrid Model for Glaucoma Detection(2021-06-23) Sarhan, Abdullah; Rokne, Jon; Alhajj, Reda; Boyd, Jeffrey Edwin; Stell, William K.; Bourlai, ThirimachosGlaucoma is the world's second-leading condition of irreversible vision loss after cataracts, accounting for 12% of annual cases of blindness. Glaucoma is a group of diseases that causes the degeneration of the retinal ganglion cells (RGCs). The death of RGCs leads to structural changes to the optic nerve head and the nerve fiber layer which leads in turn simultaneous functional failure of the visual field. These two effects of glaucoma cause peripheral vision loss, and, if left untreated, eventually blindness. Apart from early detection and treatment, no cure for glaucoma exists. Early detection is dependent on manual observation of patient's clinical data, including retinal images, OCT, and visual field test, by an ophthalmologist, which is costly and may be prone to error. As a result, most patients remain undiagnosed or improperly diagnosed, such that glaucoma progresses leading to more irreversible vision loss before it is detected. A need to enhance the diagnosis of glaucoma and thereby help to decrease blindness thus exists. This diagnosis can be effectively aided by investigating retinal images (also called fundus images) of the interior of the eye. Advances in the fields of deep learning and digital imaging have increased the potential for extracting information from the fundus images for glaucoma assessment. In this thesis, my work focuses on approaches that may help optometrists/ophthalmologists when assessing the health of an eye based on the fundus images. This lead to improving the detection rate of glaucoma and this would help reducing vision loss from progressing by early treatments. Information extracted from retinal images can be of great help when diagnosing glaucoma as noted above. One of the main informative features of the eye is the optic disc. Images of this disc may be isolated from fundus images using computational tools and hence it can be monitored and evaluated for progression when glaucoma is suspected. Other features such as changes in the optic disc region can also be used as one of the indicators when diagnosing glaucoma. For instance, the cup-to-disc ratio can be used to detect the level of intra-ocular pressure in the eye. Other indicators such as the vessels in the eye can be monitored in a similar manner. In this thesis, I develop a hybrid approach that uses various retinal structures for glaucoma detection. A deep learning-based approach for segmenting the fundus vessels, disc, and cup from retinal images is proposed. I take into consideration the issue of the limited number of retinal images available along with the variability of such images as obtained from various sources. Various features, such as the cup-to-disc ratio, are utilized to classify whether a retinal image is glaucomatous or not. The main contributions in this thesis can be enumerated as follows:(1) publishing new datasets that can be of great help for researchers working in this field; (2) development of a robust segmentation approaches that can also be helpful when working with other retinal conditions such as diabetic retinopathy; (3) development of a hybrid feature extraction approach from the segmented objects which was utilized by the classifier for glaucoma detection; (4) development of a decision support-based approach that can be the basis of a platform that ophthalmologist/optometrists can utilize when diagnosing glaucoma; and (5) The developed approach can be utilized in the field of telemedicine especially in developing countries where resources are very limited. Moreover, with the advancement in the field of the portable retinal cameras, it is possible to integrate the proposed approach with these devices to facilitate the diagnosis of glaucoma and improve the detection rate, especially in developing countries.
- ItemOpen AccessIntegrating Flexibility and Fuzziness into a Question Driven Query Model(2016-01-18) Sarhan, Abdullah; Rokne, Jon; Alhajj, Reda; Far, BehrouzData plays an important role in our daily life. Thus, data collection, storage, maintenance and processing continue to attract considerable attention. Data may exist in various formats, ranging from unstructured to structured as the two extremes. Traditionally, researchers and practitioners cooperated and developed various data models which form the main foundation for existing database management systems. The relational data model is still dominating despite the rapid development in the techniques used for data collection, storage and processing. Further, a relational database management system supports a structured query language (SQL) for data processing, and it is not possible to access and retrieve data from a relational database without knowing how to use SQL. However, the wide usage of relational databases motivated researchers to develop more user friendly interfaces which would allow a larger population of users to access relational databases. Such interfaces range from visual to natural language based. This thesis contributes a question driven query model which falls under the natural language based category. The target is to make databases reachable by a larger population, especially after the Internet increased database availability. The proposed model supports fuzziness where every user is given the freedom to de ne his/her own understanding of fuzzy terms. The developed system absorbs the fuzzy understanding of each user to utilize it while deciding on the result to be communicated back as answer to the raised question. Data mining techniques are employed to guide users in de ning their fuzzy understanding. The developed model is intended to help users to retrieve the data they want from a relational database without expecting them to know SQL. In the current version only questions written in English are allowed. The system handles di erent types of questions, such as (1) simple questions, (2) complex questions with inner joins and where conditions, (3) questions that involves the usage of aggregate functions (e.g., min, max, etc.), and (4) questions with fuzzy terms. The reported test results demonstrate the e ectiveness of the developed system in handling various types of questions raised by a heterogeneous set of users ranging from professionals to naive.
- ItemOpen AccessIntegrating Text Mining, Data Mining, and Network Analysis to Analyze Biomarker Trends in Prostate Cancer and Breast Cancer(2016) Jurca, Gabriela; Alhajj, Reda; Rokne, Jon; Wang, XinCancer is a serious disease which has many types and affects many people. One goal of biomedical researchers is to find genetic biomarkers for diagnosis and prognosis of cancer. Since there is already a vast amount of scientific publications on cancer, computational methods can be used to find hidden patterns from literature. This thesis presents a framework which investigates existing literature data by integrating text mining, data mining, and network analysis. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some of our results have been validated against results from other tools that predict gene relations and functions.
- ItemMetadata onlyINTERVAL ANALYSIS AND COMPLEX CENTERED FORMS(1984-12-01) Rokne, JonIn this paper we first discuss, briefly, the problem of approximating the real numbers with floating point computer representable numbers (a continuous space approximated by a discrete space). This approximation leads to uncertainties in numerical calculations. A tool for estimating and controlling the errors in numerical calculations in a discrete space is interval analysis. Interval analysis is therefore introduced and some basic properties are given. The merits and demerits of interval analysis are then discussed in some detail. As examples of interval analysis tools and algorithms, the natural extension idea as well as Newton's method in one dimension are discussed. The computation of inclusions for the range of functions is furthermore discussed placing particular emphasis on centered forms. We then turn to the definition of a complex interval arithmetic as well as natural extensions in this arithmetic. Here we present a number of results for polynomials and rational functions showing in particular that centered circular complex forms have some nice properties (explicit formulas, convergence, comparisons). Some numerical results are also given. A final brief discussion is given for the problem of subdividing a circle for the purpose of obtaining improved inclusions.
- ItemMetadata onlyNUMERICALLY COMPUTABLE BOUNDS FOR THE RANGE OF VALUES OF INTERVAL POLYNOMIALS(1976-01-01) Rokne, JonA central problem in interval analysis is the computation of the range of values of an interval polynomial over an interval. This problem has been treated by Dussel and Schmitt [1] and, disregarding the computational cost of their algorithm, solved in a satisfactory manner. In this paper we will discuss two algorithms by Rivlin [4] (see also Cargo and Shiska [2]) where the accuracy of the bounds depend on the amount of work one is willing to do. The first algorithm is based upon the expression of a polynomial in Bernstein polynomials. This algorithm as given by Rivlin [4] is valid for an estimate over the interval [0,1]. We will generalize the algorithm to an arbitrary finite interval and we will show that it is an appropriate algorithm if the width of the interval is not too large. The second algorithm is based upon the mean value theorem. As stated by Rivlin [4] it is valid for the interval [0,1]. We will generalize the algorithm so that it is valid for any finite interval. The algorithms are then generalized to interval arithmetic versions. Finally we compare the algorithms numerically on several polynomials.
- ItemMetadata only
- ItemOpen AccessRequirements Dependency Extraction: Advanced Machine Learning Approaches and their ROI Analysis(2022-02-02) Deshpande, Gouri; Ruhe, Guenther; Rokne, Jon; Nayebi, Maleknaz; Ferrari, Alessio; Bento, MarianaDependencies among requirements significantly impact the design, development, and testing of evolving software products. Requirements Dependencies Extraction (RDE) is a cognitively complex task due to rich semantics in natural language-based requirements, which impose challenges in automating the extraction and analysis of dependencies. The challenges intensify further when dependency types are considered. RDE is a part of the extensive decision support system to make effective software release planning, development, and testing decisions. Recently, Machine Learning and Natural Language Processing techniques have successfully automated tasks in Requirements Engineering to a large extent. Despite this success, there are some challenges to the automation of RDE - 1) Due to the nature of the problem, it is cognitively difficult to identify all the dependencies among requirements; hence generating or procuring high-quality annotations for automation through Machine Learning is an arduous task. 2) In the real-world, unlabelled data is abundant and supervised ML techniques need a training set. Lack of data for training is one of the challenges when using ML for RDE. 3) Textual requirements lack structure due to natural language, and feature extraction (transformation of the raw text into suitable internal numerical representations i.e.feature vector) techniques of NLP lead to ML techniques’ success. However, feature extraction method identification and application are cost and effort-intensive. 4) While there is a broad spectrum of Machine Learning techniques to choose from for RDE automation, not all techniques are economically viable in all the scenarios considering data size and effort investment. Hence, there is a need to evaluate the ML techniques beyond just performance measures for effective decision making. This thesis addresses these challenges and provides solutions. The results described in this thesis are derived from a series of empirical studies on industry and open-source software (OSS) datasets. The main contributions are as follows: • Performed a comprehensive assessment of Weakly Supervised Learning and Active Learning (AL) to address the data acquisition challenges using public and OSS datasets. Additionally, we compared Active Learning with Ontology-based retrieval (OBR) and further developed a hybrid solution that showed a 50% reduction in the labeling (human) effort for the two industry dataset evaluations from: Siemens Austria and Blackline safety. • Evaluated and compared a conventional ML-based Transfer Learning and state-of-the-art Deep Learning (DL) method (Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)) for 6 Mozilla products (OSS) to address lack of training data challenge. We showed that the DL method outperformed the within project’s conventional ML models by 27% to 50% (on F1-score measure). ii • Demonstrated that the state-of-the-art DL method (fine-tuned BERT) could successfully overcome the feature extraction challenge of RDE as fine-tuned BERT outperformed conventional ML methods by 13% to 27% on the F1-score for the Firefox, Redmine and Typo3 product’s datasets. Also, we showed that fine-tuned BERT successfully predicted the direction of dependency. • Utilized a nine-stage ML process model and proposed a novel ROI of ML classification modeling approach. ROI of ML classification showed scenarios when it is viable to utilize complex methods over conventional methods considering the cost and benefits of data accumulation. Utilizing OSS datasets for evaluations and practitioner inputs for cost factors, we showed accuracy and ROI trade-offs in ML approach selection for RDE. Thus, we have demonstrated empirical evidence of ROI as an additional criterion for ML performance evaluation