Classifying the Data Semantics of Patches

Locasto, Michael; Gonzalez, Robin

Classifying the Data Semantics of Patches

Files

2013-1047-14 (2).pdf(745.96 KB)

Date

2013-09-04

Authors

Locasto, Michael

Gonzalez, Robin

Abstract

Patching software remains a key defensive technique for mitigating flaws and vulnerabilities. Patches, however, entail complications that are hard to predict. Patches can be incomplete or incorrect, thereby not fully addressing the targeted flaw or introducing new bugs and unintended behavior. System administrators and owners are often at a loss to assess the risk that applying a patch might carry. Without a lengthy evaluation, they cannot predict how the patch will behave in or affect their environment. Such obstacles often prevent the use of hot patching or dynamic software updating. One major obstacle to hot patching arises from the desynchronization of existing data with the patch’s new code semantics. This paper adopts a machine learning approach to assist this kind of prediction: whether the patch contains elements that are likely to cause problems if the patch is applied to the running system. We drive this automated assessment (based on a Support Vector Machine) via an analysis of the control and data modification operations in the patch. Our SVM classifies a set of 25 unlabeled patches with 92% accuracy. As a baseline, it also classifies its testing set of 50 patches (blindly, without labels) with 84% accuracy.

Keywords

Patches, Data Semantics

URI

http://hdl.handle.net/1880/49825

Collections

Science Research & Publications

Full item page