A SYMMETRIC VERSION SPACE ALGORITHM FOR LEARNING DISJUNCTIVE STRING CONCEPTS
dc.contributor.author | Baltes, Jacky | eng |
dc.date.accessioned | 2008-05-20T23:29:57Z | |
dc.date.available | 2008-05-20T23:29:57Z | |
dc.date.computerscience | 1999-05-27 | eng |
dc.date.issued | 1992-03-01 | eng |
dc.description.abstract | The paper describes an algorithm to learn disjunctive string patterns from examples. An implementation of the algorithm learns concepts such as all C source files (*.c or *.h) in the UNIX domain. The algorithm is designed for an interactive learning model in which the number of questions to the user are minimized. Furthermore, only simple questions to the user are admissible. The representation language is restricted to a subset of regular expressions in order to reduce the complexity. The patterns in the concept grammar are extensions of Nix's gap patterns and Mo's annotated gap patterns. To learn sequences of patterns, the strings are broken up into units. The algorithm assumes that the first example is a good representative of the concept, that is, it contains all optional units. Even for limited disjunctions, the G-set of Mitchell's Candidate Elimination algorithm is infinite for domains that contain an infinite number of elements. A symmetric version space (SVS) algorithm using cover sets of the positive and negative examples computes only the most specific description that matches a set of examples and can therefore be used to learn in version spaces that make it impossible or infeasible to compute the most general descriptions. Two extra cover sets are maintained to avoid asking the user about all possible examples, a problem that arises when the number of terms in the concept is less than the maximum number of terms in a disjunction. The correctness of the SVS algorithm in version spaces with minimal generalization hierarchies, so called k-disjunctive version spaces, is shown. Furthermore, it is argued that an extended version of the SVS algorithm solves the swapping and splitting problem. The last result of the paper establishes that, though the size of the version space grows exponentially, the sample size complexity increases only linearly with the maximum length of units. | eng |
dc.description.notes | We are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at digitize@ucalgary.ca | eng |
dc.identifier.department | 1992-468-6 | eng |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/31285 | |
dc.identifier.uri | http://hdl.handle.net/1880/46538 | |
dc.language.iso | Eng | eng |
dc.publisher.corporate | University of Calgary | eng |
dc.publisher.faculty | Science | eng |
dc.subject | Computer Science | eng |
dc.title | A SYMMETRIC VERSION SPACE ALGORITHM FOR LEARNING DISJUNCTIVE STRING CONCEPTS | eng |
dc.type | unknown | |
thesis.degree.discipline | Computer Science | eng |