Toward completeness in concept extraction and classification

Update Item Information
Publication Type Journal Article
School or College College of Engineering
Department Computing, School of
Creator Riloff, Ellen M.
Other Author Hovy, Eduard; Kozareva, Zornitsa
Title Toward completeness in concept extraction and classification
Date 2009
Description Many algorithms extract terms from text together with some kind of taxonomic classification (is-a) link. However, the general approaches used today, and specifically the methods of evaluating results, exhibit serious shortcomings. Harvesting without focusing on a specific conceptual area may deliver large numbers of terms, but they are scattered over an immense concept space, making Recall judgments impossible. Regarding Precision, simply judging the correctness of terms and their individual classification links may provide high scores, but this doesn't help with the eventual assembly of terms into a single coherent taxonomy. Furthermore, since there is no correct and complete gold standard to measure against, most work invents some ad hoc evaluation measure. We present an algorithm that is more precise and complete than previous ones for identifying from web text just those concepts ‘below' a given seed term. Comparing the results to WordNet, we find that the algorithm misses terms, but also that it learns many new terms not in WordNet, and that it classifies them in ways acceptable to humans but different from WordNet.
Type Text
Publisher Association for Computational Linguistics
First Page 1
Last Page 10
Subject Concept extraction; Concept classification
Subject LCSH Information retrieval; WordNet; Categorization (Linguistics)
Language eng
Bibliographic Citation Hovy, E., Kozareva, Z., & Riloff, E. M. (2009). Toward completeness in concept extraction and classification. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-09), 1-10.
Rights Management (c)Hovy, E., Kozareva, Z., & Riloff, E. M.
Format Medium application/pdf
Format Extent 112,083 bytes
Identifier ir-main,12418
ARK ark:/87278/s6xh08bv
Setname ir_uspace
ID 703331
Reference URL https://collections.lib.utah.edu/ark:/87278/s6xh08bv
Back to Search Results