Classifying the Data Semantics of Patches
Date
2013-09-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Patching software remains a key defensive technique for mitigating
flaws and vulnerabilities. Patches, however, entail complications that
are hard to predict. Patches can be incomplete or incorrect, thereby
not fully addressing the targeted flaw or introducing new bugs and
unintended behavior. System administrators and owners are often at
a loss to assess the risk that applying a patch might carry. Without
a lengthy evaluation, they cannot predict how the patch will behave
in or affect their environment. Such obstacles often prevent the use
of hot patching or dynamic software updating. One major obstacle to
hot patching arises from the desynchronization of existing data with
the patch’s new code semantics.
This paper adopts a machine learning approach to assist this kind
of prediction: whether the patch contains elements that are likely to
cause problems if the patch is applied to the running system. We drive
this automated assessment (based on a Support Vector Machine) via an
analysis of the control and data modification operations in the patch.
Our SVM classifies a set of 25 unlabeled patches with 92% accuracy.
As a baseline, it also classifies its testing set of 50 patches (blindly,
without labels) with 84% accuracy.
Description
Keywords
Patches, Data Semantics