Learning dictionaries for information extraction by multi-level bootstrapping

Update Item Information
Publication Type Journal Article
School or College College of Engineering
Department Computing, School of
Creator Riloff, Ellen M.
Other Author Jones, Rosie
Title Learning dictionaries for information extraction by multi-level bootstrapping
Date 1999
Description Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual bootstrapping technique to alternately select the best extraction pattern for the category and bootstrap its extractions into the semantic lexicon, which is the basis for selecting the next extraction pattern. To make this approach more robust, we add a second level of bootstrapping (metabootstrapping) that retains only the most reliable lexicon entries produced by mutual bootstrapping and then restarts the process. We evaluated this multilevel bootstrapping technique on a collection of corporate web pages and a corpus of terrorism news articles. The algorithm produced high-quality dictionaries for several semantic categories.
Type Text
Publisher Association for the Advancement of Artificial Intelligence (AAAI)
First Page 1
Last Page 6
Subject Information extraction; Extraction patterns; Multi-level bootstrapping; Learning dictionaries
Subject LCSH Information retrieval; Natural language processing (Computer science); Programming languages (Electronic computers) -- Semantics
Language eng
Bibliographic Citation Riloff, E., & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 1-6.
Rights Management (c)AAAI http://www.aaai.org/
Format Medium application/pdf
Format Extent 1,093,331 bytes
Identifier ir-main,12411
ARK ark:/87278/s6bp0m56
Setname ir_uspace
ID 705051
Reference URL https://collections.lib.utah.edu/ark:/87278/s6bp0m56
Back to Search Results