Browsing by Author "Denzinger, Jorg"
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Open Access API Usage Templates via Structural Generalization(2023-05-03) Mahmoud, May Abdelrheem Sayed; Walker, Robert James; Denzinger, Jorg; Maurer, Frank O.; Aycock, John Daniel; Hindle, AbramApplication programming interfaces (APIs) are key in software development, but determining how to use one can be challenging. Developers often refer to a small set of API usage examples, analyzing the information in them to understand the API usage and adapting them to their own context. Generalization of these examples would aid in understanding their commonalities and differences, thereby reducing information overload. Work on API usage mining seeks recurrent information in usage examples. Some approaches seek frequent subsequences of method calls (e.g., Monperrus et al., 2010;Wasylkowski and Zeller, 2011; Fowkes and Sutton, 2016). Others use graph-based representations, applying frequent subgraph mining techniques (e.g., Nguyen et al., 2009; Amann et al., 2019). However, all such approaches focus on frequently occurring commonalities; this results in either excluding variations in the usage of the API elements in similar contexts or subdividing such variations across several patterns, forcing developers to manually determine variability in the API elements’ usage. Approaches that aim to select the best examples (e.g., Moreno et al., 2013) ignore variation. Approaches that generate examples (e.g., Barnaby et al., 2020) focus on producing maximally succinct examples rather than representing whatever commonality is present. In this thesis, we propose ASGard (for API usage templates via Structural Generalization), a novel approach that automatically generates API usage templates from usage examples based on the generalization of the examples’ syntactic structure and some semantic structure. API usage templates are a code-based representation generalizing similar API usage contexts, showing the commonality of the usage examples, where the varying aspects of the input examples are replaced with structural variables intended as placeholders. ASGard takes a set of API usage examples and a simple indication of the API of interest, as input. We proceed in two phases. (1) For the sake of improved performance, we cluster the examples based on the similarity of the API usage. (2) We then use an approximation of the formalism of E-generalization (Burghardt, 2005) to infer API usage templates from the examples. We start with matching the nodes of the ASTs of the examples, seeking to preserve common elements in the nodes while abstracting away the differences. The generalization proceeds iteratively, permitting increasing abstraction of the template as long as no API usage information is eliminated. The final templates are representations of the generalized ASTs. We perform a manual evaluation of the output templates from ASGard, which generalize a set of 231 usage examples across 5 different APIs, finding that our approach provides a mean 62% coverage of the API usage elements found in the usage examples as opposed to 48% coverage by the best alternative. Furthermore, we automatically evaluate the templates from our approach and the code representation of the patterns generated from PAM and MUDetect (two prominent API usage mining approaches), using a total of 1,954 API usage examples across 59 different APIs. We measure two aspects of the quality of the resulting templates: (1) how complete each template is relative to each concrete example; and (2) how well each template set compresses the set of API usage examples. We find that, compared to the output from PAM and MUDetect, ASGard provides templates that have superior completeness (51% vs. 12% for PAM and 25% for MUDetect) and far superior compression (81% vs. 54% for PAM and 26% for MUDetect). We perform a user study on ASGard with 12 participants to compare the use of these templates in solving programming tasks compared to MUDetect. We find that participants solved the programming tasks in significantly less time with ASGard: 48% for a coding task and 31% for a debugging task. Participants expressed a preference for using ASGard templates and perceived that the approach helped them better understand the API usage; they were more willing to use the approach again than the best alternative.Item Open Access CoLe: A Cooperative Distributed Data Mining Model(2005-03-08) Gao, Jie; Denzinger, Jorg; James, Robert C.We present CoLe, a cooperative, distributed model for mining knowledge from heterogeneous data. CoLe allows for the cooperation of different learning algorithms and the combination of the mined knowledge into knowledge structures that no individual learner can produce. CoLe organizes the work in rounds so that knowledge discovered by one learner can help others in the next round. We implemented a CoLe-based system for mining diabetes data, including a genetic algorithm for learning event sequences, improvements to the PART algorithm for our problem and combination methods to produce hybrid rules containing conjunctive and sequence conditions. In our experiments, the CoLe-based system outperformed the individual learners, with better rules and more rules of a certain quality. Our improvements to learners also showed the ability to find useful rules. From the medical perspective, our system confirmed hypertension has a tight relation to diabetes, and it also suggested connections new to medical doctors.Item Open Access Design and application of small-scale quantum information processors(2022-07-22) Dalal, Archismita; Sanders, Barry C.; Denzinger, Jorg; Simon, Christoph; Salahub, Dennis R.; Petrosyan, DavidThe field of quantum computing is developing rapidly, with extensive research being undertaken in several topics ranging from designing novel computing architectures to developing algorithms for achieving quantum advantage. I address two research directions from this wide spectrum of topics and define my PhD research goals. My first objective is to design high-fidelity controlled-Z (CZ) gates for neutral atoms, and my second objective is to construct a quantum-assisted machine-learning model for solving non-linear regression problems. The potential of neutral-atom quantum computer stems from its unique ability to coherently control several stable qubits with the possibility of strong, long-range interactions between qubits; however the fidelity of a native two-qubit entangling gate on this platform lags behind competing platforms of superconducting systems and trapped ions. We propose gate procedures that rely on simultaneous driving of a pair of Caesium (Cs) atoms using broadband laser pulses and predict high-fidelity CZ gates. Using smooth and globally-optimized adiabatic pulse shapes, our simulations predict fidelities exceeding 0.997 in the presence of spontaneous emission from excited energy levels of Cs. By transitionless quantum driving of each Cs atom, we yield a CZ gate with fidelity 0.9985 over an operation time of 0.12 μs in the presence of spontaneous emission and major technical imperfections. The support vector regression (SVR) is a widely-used classical machine-learning model for regression tasks, including prediction of weather, stock market and real-estate pricing; yet, a currently-feasible quantum SVR model is missing from literature. We formulate quantum-assisted SVR based on quantum annealing, and compare its empirical performance against classical models for the task of detecting facial landmarks. By training the quantum-assisted model using the state-of-the-art quantum annealer, we demonstrate comparable performance of this model and two classical models for the landmark-detection task.Our results on high-fidelity CZ gates show that our gate procedures carry significant potential for achieving scalable quantum computing using atoms. On the other hand, our quantum-assisted SVR acts as a feasible quantum alternative for non-linear regression tasks.Item Open Access Efficient Bounded Timestamping from Standard Synchronization Primitives(2024-04-25) Jamadi, Ali; Woelfel, Philipp; Woelfel, Philipp; Jayanti, Prasad; Cockett, Robin; Denzinger, JorgTimestamping systems are at the core of many solutions to fundamental problems in distributed algorithms [2, 4, 11, 13, 15, 22, 25, 27], and are well-studied. These systems allow processes to determine the temporal ordering of events by assigning a timestamp to each event and defining an ordering on the timestamps that corresponds to the temporal order of events. Israeli and Li [19] were the first to formalize a variation of these systems, called bounded timestamping systems (BTS), wherein the domain of timestamps is finite. They established a crucial lower bound that implies any BTS implementation, where the values of the timestamps assigned to events are sufficient to determine the temporal order of events, requires Ω(𝑛) bits to represent a timestamp. Existing BTS implementations [3, 6–8, 12, 20] have Ω(𝑛) step complexity and use registers of size Ω(𝑛) bits. This thesis introduces a novel specification for a bounded times- tamping system called marker based timestamping (MBT), which has a more efficient implementation than all prior BTS implementations and is still practical. An MBT object keeps track of 𝑚 markers where each marker belongs to a process and each process has at least one marker. It provides two functionalities: mark(𝑖) operation to mark the current point in time with marker 𝑖, and isEarlier(𝑖, 𝑗) to determine the temporal order of the events marked by markers 𝑖 and 𝑗. Note that the lower bound of Israeli and Li does not apply to an MBT object. We present a linearizable, wait-free MBT implementation with constant step complexity from 𝑂(𝑚 · 𝑛) Compare-And-Swap (CAS) objects and a single bounded Fetch-And-Add (FAA) object, where each base object stores only 𝑂(log 𝑚) bits.Item Open Access Evaluation of Complex Surveillance Systems for Emergent Vulnerability(2010-09-08T18:38:20Z) Thornton, Christopher; Cohen, Ori; Denzinger, Jorg; Boyd, JeffreyThe current paradigm for testing tracking and surveillance systems is to identify representative metrics for system components, then optimize the performance of that metric against test data. The assumption is that optimization of individual components will optimize the surveillance system as a whole. However, while optimizing components is a necessary step to improve systems, it is not sufficient to address vulnerabilities that emerge in a large system with many components. A large surveillance system will have many cameras and other sensors. In some cases, to cover more area, the cameras and sensors may be mobile. Coverage is unlikely to be complete in all areas at all times, so sensor allocation will follow some policy. The combination of sensors, sensor properties, mobility and policy can result in a system that is vulnerable in ways that are difficult to predict. We present a method to model and predict emergent vulnerabilities in a complex surveillance system. To demonstrate the method, we apply it to a downscaled physical surveillance system that uses multiple stationary and mobile camera platforms to monitor and defend against intrusions. Our method finds two vulnerabilities in the system in simulation, one of which we demonstrate with the physical system.Item Open Access Exploratory Testing For Unwanted Behavior Using Evolutionary Learning Techniques(2007-07-12) Denzinger, JorgWe present a method that allows to create tools that help human testers to do exploratory testing for unwanted behavior of software systems and their components. The general idea is to use evolutionary learning to find interaction schemes with the tested system that result in the tested system showing the unwanted behavior or coming near to it. We present two case studies that instantiate our method and that resulted in detecting not previously known unwanted behavior in both tested systems.Item Open Access Graph Generalization for Software Engineering(2024-01-23) Kianifar, Mohammad Reza; Walker, Robert James; Denzinger, Jorg; Maleki, FarhadGraph generalization is a powerful concept with a wide range of potential applications, not only within software engineering but also across various domains. While established algorithms exist for generalizing simple graphs, such as trees, the development of practical methods for applying generalization techniques to more complex graphs remains a critical challenge. In this thesis, we introduce a novel formal model and algorithm, referred to as GGA (Graph Generalization Algorithm), dedicated to generalizing labelled directed graphs. We evaluate GGA by focusing on key aspects including its information preservation relative to its input graphs, its scalability in execution, and for three applications each utilizing differing kinds of graph: (1) abstract syntax trees (ASTs); (2) class graphs; and (3) call graphs. In the first case, GGA is compared against ASGard and Diff-Sitter, two existing approaches for tree-based generalization; in the latter two cases, GGA is compared against Diff and CodeMetrics. Our findings reveal GGA's superiority over the alternatives. In the AST application, GGA outperforms ASGard by an average of 5-18% on metrics related to information preservation. GGA's results for the AST differencing context also matched 100% with Diff-Sitter by applying symmetrical filtering to skip strictness configuration. In the context of class graphs, GGA achieves 77.1% in precision@5, while in the case of call graphs, it exhibits 60% in precision@5. We also performed an extensive performance test for the first two applications, and the result shows that GGA's execution time scales linearly with respect to the product of vertex count and edge count. Our research not only introduces a novel algorithm for graph generalization but also demonstrates its ability to preserve information in diverse applications while performing efficiently. These results signify the potential of GGA to advance the field of graph generalization and its practical applicability across various domains, specifically in software engineering.Item Open Access Improving Deep Neural Networks: Optimization, Regularization, and Generative Modeling(2019-12) Zhang, Zijun; Li, Zongpeng; Samavati, Faramarz; Li, Zongpeng; Denzinger, Jorg; Krishnamurthy, DiwakarIn the past decade, deep learning has revolutionized the fields of computer vision, speech recognition, natural language processing, and continues spreading to many other fields. Therefore, it is important to better understand and improve deep neural networks (DNNs), which serve as the backbone of deep learning. In this thesis, we approach this topic from three different perspectives: optimization, regularization, and generative modeling. Firstly, we address the generalization gap recently observed between adaptive optimization methods, such as Adam, and simple stochastic gradient descent (SGD). We develop a tailored version of Adam for training DNNs, which is shown to close the gap on image classification tasks. Secondly, we identify a side effect of a widely used regularization technique, dropout, and multiplicative noise in general. That is, multiplicative noise tends to increase the correlation between features. We then exploit batch normalization to efficiently remove the correlation effect. Finally, we focus on generative modeling, a fundamental application of DNNs. We propose a framework for training autoencoder-based generative models, with non-adversarial losses and unrestricted neural network architectures.Item Open Access Ontology-Enhanced Automated Machine Learning(2024-11-20) Davies, Cooper T.S.; Denzinger, Jorg; Maurer, Frank; Jacob, Christian; Walker, Robert; Dick, Scott; Boyd, JeffreyThis thesis addresses the challenge of bridging the gap between traditional Problem-Specific Machine Learning (PSML) and Automated Machine Learning (AutoML) systems. While PSML offers high accuracy but demands substantial expertise, AutoML aims to auto-mate the process of building a machine learning (ML) model but often lacks domain-specific knowledge. To address this, we propose Ontology-Enhanced AutoML, a novel approach that integrates domain knowledge from ontologies into the AutoML pipeline. We first examine the current landscape of AutoML, highlighting the complexities faced by a system in selecting appropriate algorithms and hyperparameters. We identify the limitations of existing AutoML systems, particularly their blind reliance on datasets, which often leads to poor performance and lengthy training times. Our thesis presents experiments demonstrating the effectiveness of Ontology-Enhanced AutoML in mitigating these challenges. By incorporating mechanisms for ontology-based feature extraction and example filtering, we demonstrate significant improvements in accu-racy and optimization time compared to traditional AutoML. These results highlight the potential of Ontology-Enhanced AutoML to provide a wide range of systems lying between the extremes of PSML and AutoML. This thesis contributes not only a technical solution but also a conceptual framework for understanding ML as a spectrum. We discuss implications for future research and the potential for further advancements in bridging the gap between domain expertise and ML proficiency.Item Open Access The Tale of the Weather Worm(2007-04-18) Szabo, Joe; Aycock, John; Acton, Randal; Denzinger, JorgHow humans behave when faced with a disaster, natural or man-made, can be exploited automatically by news-aware malicious software. We introduce the idea of weather worms, worms that can automatically identify abnormal events and their location, and target computers at that physical location. Such worms could be used to take advantage of poorly-defended computers in a disaster zone, and could even amplify the effects of a physical terrorist attack. Defenses against weather worms require serious examination of policy and presentation of information on the Internet.Item Open Access Using Active Probing by a Game Management AI to Faster Classify Players in Online Video Games(2021-06) Eidelberg, Arkady; Jacob, Christian; Denzinger, Jorg; Aycock, John Daniel; Zhao, RichardA Game Management AI is a framework to classify players based on their interest of the game. It is different from other work in this area by the fact that it actively manipulates the game state. This encourages the players to act in a certain way (or not), indirectly providing data currently missing to achieve the classification. This is called “Active Probing". The Game Management AI uses two sets of rules. The first contains rules that are intended to represent the knowledge allowing a classification and the second contains rules that indicate which game events can contribute to triggering conditions used in the first rule set. The Game Management AI was evaluated on the role playing game “Realm of Dreams”, a game that was created for this purpose. The experimental evaluation showed that using the active probing by the Game Management AI allows dentification of players highly interested in the game four times faster than such players were identified without active probing.Item Open Access Using Learning of Behavior Rules to Mine Medical Data for Sequence Rules(2004-03-02) Denzinger, Jorg; Gao, JieIn fields like medical care the temporal relations in the records (transactions) are of great help for identifying a particular group of cases. Thus there is some need for sequence rule learning in the classification problems in these fields. In this paper, a genetic algorithm for sequence rule learning is presented based on concepts from learning behavior of agents. The algorithm employs a Michigan-like approach to evolve a group of sequence rules, and extracts good ones into the result sequence rule set from time to time. It contains a novel quality-based intelligent genetic operator, and many adaptive enhancements to make implicit use of data-set-specific knowledge. The algorithm is evaluated on a real-world medical data set from the PKDD 99 Challenge. The results indicate that the algorithm can get satisfactory sequence rule sets from the sparse and noisy data set.Item Open Access Using Stereotypes and Compactification of Observations to Improve Modeling of Other Agents(2004-05-04) Denzinger, Jorg; Hamdan, JasmineThis paper investigates improvements to modeling other agents based on observed situation-action pairs and the nearest-neighbor rule, which suffers when dealing with very few or very many observations. Stereotype models allow for good predictions of a modeled agent s behavior even after few observations. To handle any adverse effects of stereotyping, periodic reevaluation of the chosen stereotype and the potential to switch between different stereotypes aids in dealing with very similar, but not identical, stereotypes. Also, periodic reevaluation and the potential to switch from a chosen stereotype to the original observation based modeling method aids in dealing with agents that do not conform to any stereotype. Finally, compactification of observation keeps the application of the original modeling method efficient by reducing comparisons within the nearest neighbor rule.
Our experiments within the OLEMAS system show that stereotyping significantly improves cases where just using the original modeling performs poorly. In addition, reevaluation and switching fortify stereotyping against the potential risk of using an incorrect stereotype. In many cases, compactification both increases the efficiency of using observed situation-action pairs and improves predictions by filtering out misleading observation. At times however, this filtering comes at the cost of prediction accuracy by removing relevant observations.