Corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction

Update Item Information
Publication Type Journal Article
School or College College of Engineering
Department Computing, School of
Creator Riloff, Ellen M.
Other Author Shepherd, Jessica
Title Corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction
Date 1999-06
Description Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specifi c semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of 'seed words' that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain.
Type Text
Publisher Cambridge University Press
Journal Title Natural Language Engineering
Volume 5
Issue 2
First Page 147
Last Page 156
DOI 10.1017/S1351324999002235
citatation_issn 13513249
Subject Bootstrapping algorithm; Lexicon construction
Subject LCSH Programming languages (Electronic computers) -- Semantics; Domain-specific programming languages
Language eng
Bibliographic Citation Riloff, E. M., & Shepherd, J. (1999). Corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction. Journal of Natural Language Engineering, 5(2), 147-56.
Rights Management (c) Cambridge University Press http://dx.doi.org/ 10.1017/S1351324999002235 Permission granted by Cambridge University Press for non-commercial, personal use only.
Format Medium application/pdf
Format Extent 140,953 bytes
Identifier ir-main,12426
ARK ark:/87278/s65d997g
Setname ir_uspace
ID 705354
Reference URL https://collections.lib.utah.edu/ark:/87278/s65d997g
Back to Search Results