Publication Type |
Journal Article |
School or College |
College of Engineering |
Department |
Computing, School of |
Creator |
Riloff, Ellen M. |
Other Author |
Igo, Sean P. |
Title |
Corpus-based semantic lexicon induction with web-based corroboration |
Date |
2009 |
Description |
Various techniques have been developed to automatically induce semantic dictionaries from text corpora and from the Web. Our research combines corpus-based semantic lexicon induction with statistics acquired from the Web to improve the accuracy of automatically acquired domain-specific dictionaries. We use a weakly supervised bootstrapping algorithm to induce a semantic lexicon from a text corpus, and then issue Web queries to generate co-occurrence statistics between each lexicon entry and semantically related terms. The Web statistics provide a source of independent evidence to confirm, or disconfirm, that a word belongs to the intended semantic category. We evaluate this approach on 7 semantic categories representing two domains. Our results show that the Web statistics dramatically improve the ranking of lexicon entries, and can also be used to filter incorrect entries. |
Type |
Text |
Publisher |
Association for Computational Linguistics |
First Page |
1 |
Last Page |
9 |
Subject |
Corpus-based; Text corpora; Domain-specific dictionaries; Bootstrapping algorithm |
Subject LCSH |
Programming languages (Electronic computers) -- Semantics; Information retrieval; Corpora (Linguistics) |
Language |
eng |
Bibliographic Citation |
Igo, S. P., & Riloff, E. M. (2009). Corpus-based semantic lexicon induction with web-based corroboration. NAACL-09 Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, 1-9. |
Rights Management |
(c) Igo, S. P., & Riloff, E. M. |
Format Medium |
application/pdf |
Format Extent |
94,892 bytes |
Identifier |
ir-main,12420 |
ARK |
ark:/87278/s6m623bn |
Setname |
ir_uspace |
ID |
702348 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s6m623bn |