Publication Type |
Journal Article |
School or College |
College of Engineering |
Department |
Computing, School of |
Creator |
Riloff, Ellen M. |
Other Author |
Shepherd, Jessica |
Title |
Corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction |
Date |
1999-06 |
Description |
Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specifi c semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of 'seed words' that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain. |
Type |
Text |
Publisher |
Cambridge University Press |
Journal Title |
Natural Language Engineering |
Volume |
5 |
Issue |
2 |
First Page |
147 |
Last Page |
156 |
DOI |
10.1017/S1351324999002235 |
citatation_issn |
13513249 |
Subject |
Bootstrapping algorithm; Lexicon construction |
Subject LCSH |
Programming languages (Electronic computers) -- Semantics; Domain-specific programming languages |
Language |
eng |
Bibliographic Citation |
Riloff, E. M., & Shepherd, J. (1999). Corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction. Journal of Natural Language Engineering, 5(2), 147-56. |
Rights Management |
(c) Cambridge University Press http://dx.doi.org/ 10.1017/S1351324999002235 Permission granted by Cambridge University Press for non-commercial, personal use only. |
Format Medium |
application/pdf |
Format Extent |
140,953 bytes |
Identifier |
ir-main,12426 |
ARK |
ark:/87278/s65d997g |
Setname |
ir_uspace |
ID |
705354 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s65d997g |