API Usage Templates via Structural Generalization

Mahmoud, May Abdelrheem Sayed

API Usage Templates via Structural Generalization

dc.contributor.advisor	Walker, Robert James
dc.contributor.author	Mahmoud, May Abdelrheem Sayed
dc.contributor.committeemember	Denzinger, Jorg
dc.contributor.committeemember	Maurer, Frank O.
dc.contributor.committeemember	Aycock, John Daniel
dc.contributor.committeemember	Hindle, Abram
dc.date	2023-11
dc.date.accessioned	2023-05-09T15:33:56Z
dc.date.available	2023-05-09T15:33:56Z
dc.date.issued	2023-05-03
dc.description.abstract	Application programming interfaces (APIs) are key in software development, but determining how to use one can be challenging. Developers often refer to a small set of API usage examples, analyzing the information in them to understand the API usage and adapting them to their own context. Generalization of these examples would aid in understanding their commonalities and differences, thereby reducing information overload. Work on API usage mining seeks recurrent information in usage examples. Some approaches seek frequent subsequences of method calls (e.g., Monperrus et al., 2010;Wasylkowski and Zeller, 2011; Fowkes and Sutton, 2016). Others use graph-based representations, applying frequent subgraph mining techniques (e.g., Nguyen et al., 2009; Amann et al., 2019). However, all such approaches focus on frequently occurring commonalities; this results in either excluding variations in the usage of the API elements in similar contexts or subdividing such variations across several patterns, forcing developers to manually determine variability in the API elements’ usage. Approaches that aim to select the best examples (e.g., Moreno et al., 2013) ignore variation. Approaches that generate examples (e.g., Barnaby et al., 2020) focus on producing maximally succinct examples rather than representing whatever commonality is present. In this thesis, we propose ASGard (for API usage templates via Structural Generalization), a novel approach that automatically generates API usage templates from usage examples based on the generalization of the examples’ syntactic structure and some semantic structure. API usage templates are a code-based representation generalizing similar API usage contexts, showing the commonality of the usage examples, where the varying aspects of the input examples are replaced with structural variables intended as placeholders. ASGard takes a set of API usage examples and a simple indication of the API of interest, as input. We proceed in two phases. (1) For the sake of improved performance, we cluster the examples based on the similarity of the API usage. (2) We then use an approximation of the formalism of E-generalization (Burghardt, 2005) to infer API usage templates from the examples. We start with matching the nodes of the ASTs of the examples, seeking to preserve common elements in the nodes while abstracting away the differences. The generalization proceeds iteratively, permitting increasing abstraction of the template as long as no API usage information is eliminated. The final templates are representations of the generalized ASTs. We perform a manual evaluation of the output templates from ASGard, which generalize a set of 231 usage examples across 5 different APIs, finding that our approach provides a mean 62% coverage of the API usage elements found in the usage examples as opposed to 48% coverage by the best alternative. Furthermore, we automatically evaluate the templates from our approach and the code representation of the patterns generated from PAM and MUDetect (two prominent API usage mining approaches), using a total of 1,954 API usage examples across 59 different APIs. We measure two aspects of the quality of the resulting templates: (1) how complete each template is relative to each concrete example; and (2) how well each template set compresses the set of API usage examples. We find that, compared to the output from PAM and MUDetect, ASGard provides templates that have superior completeness (51% vs. 12% for PAM and 25% for MUDetect) and far superior compression (81% vs. 54% for PAM and 26% for MUDetect). We perform a user study on ASGard with 12 participants to compare the use of these templates in solving programming tasks compared to MUDetect. We find that participants solved the programming tasks in significantly less time with ASGard: 48% for a coding task and 31% for a debugging task. Participants expressed a preference for using ASGard templates and perceived that the approach helped them better understand the API usage; they were more willing to use the approach again than the best alternative.
dc.identifier.citation	Mahmoud, M. A. S. (2023). API usage templates via structural generalization (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.uri	http://hdl.handle.net/1880/116188
dc.identifier.uri	https://dx.doi.org/10.11575/PRISM/dspace/41033
dc.language.iso	en
dc.publisher.faculty	Graduate Studies
dc.publisher.institution	University of Calgary
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subject	API usage
dc.subject	Coding templates
dc.subject	E-generalization
dc.subject.classification	Computer Science
dc.title	API Usage Templates via Structural Generalization
dc.type	doctoral thesis
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Calgary
thesis.degree.name	Doctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudent	I do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.