Classifying the Data Semantics of Patches

Date
2013-09-04
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Patching software remains a key defensive technique for mitigating flaws and vulnerabilities. Patches, however, entail complications that are hard to predict. Patches can be incomplete or incorrect, thereby not fully addressing the targeted flaw or introducing new bugs and unintended behavior. System administrators and owners are often at a loss to assess the risk that applying a patch might carry. Without a lengthy evaluation, they cannot predict how the patch will behave in or affect their environment. Such obstacles often prevent the use of hot patching or dynamic software updating. One major obstacle to hot patching arises from the desynchronization of existing data with the patch’s new code semantics. This paper adopts a machine learning approach to assist this kind of prediction: whether the patch contains elements that are likely to cause problems if the patch is applied to the running system. We drive this automated assessment (based on a Support Vector Machine) via an analysis of the control and data modification operations in the patch. Our SVM classifies a set of 25 unlabeled patches with 92% accuracy. As a baseline, it also classifies its testing set of 50 patches (blindly, without labels) with 84% accuracy.
Description
Keywords
Patches, Data Semantics
Citation