API Usage Templates via Structural Generalization

Mahmoud, May Abdelrheem Sayed

API Usage Templates via Structural Generalization

Files

ucalgary_2023_mahmoud_may.pdf(4.24 MB)

Date

2023-05-03

Authors

Mahmoud, May Abdelrheem Sayed

Abstract

Application programming interfaces (APIs) are key in software development, but determining how to use one can be challenging. Developers often refer to a small set of API usage examples, analyzing the information in them to understand the API usage and adapting them to their own context. Generalization of these examples would aid in understanding their commonalities and differences, thereby reducing information overload. Work on API usage mining seeks recurrent information in usage examples. Some approaches seek frequent subsequences of method calls (e.g., Monperrus et al., 2010;Wasylkowski and Zeller, 2011; Fowkes and Sutton, 2016). Others use graph-based representations, applying frequent subgraph mining techniques (e.g., Nguyen et al., 2009; Amann et al., 2019). However, all such approaches focus on frequently occurring commonalities; this results in either excluding variations in the usage of the API elements in similar contexts or subdividing such variations across several patterns, forcing developers to manually determine variability in the API elements’ usage. Approaches that aim to select the best examples (e.g., Moreno et al., 2013) ignore variation. Approaches that generate examples (e.g., Barnaby et al., 2020) focus on producing maximally succinct examples rather than representing whatever commonality is present. In this thesis, we propose ASGard (for API usage templates via Structural Generalization), a novel approach that automatically generates API usage templates from usage examples based on the generalization of the examples’ syntactic structure and some semantic structure. API usage templates are a code-based representation generalizing similar API usage contexts, showing the commonality of the usage examples, where the varying aspects of the input examples are replaced with structural variables intended as placeholders. ASGard takes a set of API usage examples and a simple indication of the API of interest, as input. We proceed in two phases. (1) For the sake of improved performance, we cluster the examples based on the similarity of the API usage. (2) We then use an approximation of the formalism of E-generalization (Burghardt, 2005) to infer API usage templates from the examples. We start with matching the nodes of the ASTs of the examples, seeking to preserve common elements in the nodes while abstracting away the differences. The generalization proceeds iteratively, permitting increasing abstraction of the template as long as no API usage information is eliminated. The final templates are representations of the generalized ASTs. We perform a manual evaluation of the output templates from ASGard, which generalize a set of 231 usage examples across 5 different APIs, finding that our approach provides a mean 62% coverage of the API usage elements found in the usage examples as opposed to 48% coverage by the best alternative. Furthermore, we automatically evaluate the templates from our approach and the code representation of the patterns generated from PAM and MUDetect (two prominent API usage mining approaches), using a total of 1,954 API usage examples across 59 different APIs. We measure two aspects of the quality of the resulting templates: (1) how complete each template is relative to each concrete example; and (2) how well each template set compresses the set of API usage examples. We find that, compared to the output from PAM and MUDetect, ASGard provides templates that have superior completeness (51% vs. 12% for PAM and 25% for MUDetect) and far superior compression (81% vs. 54% for PAM and 26% for MUDetect). We perform a user study on ASGard with 12 participants to compare the use of these templates in solving programming tasks compared to MUDetect. We find that participants solved the programming tasks in significantly less time with ASGard: 48% for a coding task and 31% for a debugging task. Participants expressed a preference for using ASGard templates and perceived that the approach helped them better understand the API usage; they were more willing to use the approach again than the best alternative.

Keywords

API usage, Coding templates, E-generalization

Citation

Mahmoud, M. A. S. (2023). API usage templates via structural generalization (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.

URI

http://hdl.handle.net/1880/116188
https://dx.doi.org/10.11575/PRISM/dspace/41033

Collections

Open Theses and Dissertations

Full item page