Publication Type |
Journal Article |
School or College |
College of Engineering |
Department |
Computing, School of |
Creator |
Riloff, Ellen M. |
Other Author |
Patwardhan, Siddharth |
Title |
Learning domain-specific information extraction patterns from the web |
Date |
2006 |
Description |
Many information extraction (IE) systems rely on manually annotated training data to learn patterns or rules for extracting information about events. Manually annotating data is expensive, however, and a new data set must be annotated for each domain. So most IE training sets are relatively small. Consequently, IE patterns learned from annotated training sets often have limited coverage. In this paper, we explore the idea of using the Web to automatically identify domain-specific IE patterns that were not seen in the training data. We use IE patterns learned from the MUC-4 training set as anchors to identify domain-specific web pages and then learn new IE patterns from them. We compute the semantic affinity of each new pattern to automatically infer the type of information that it will extract. Experiments on the MUC-4 test set show that these new IE patterns improved recall with only a small precision loss. |
Type |
Text |
Publisher |
Association for Computational Linguistics |
First Page |
66 |
Last Page |
73 |
Subject |
Information extraction; Domain-specific; Annotated training sets; MUC-4 |
Subject LCSH |
Information retrieval; Information retrieval -- Study and teaching |
Language |
eng |
Bibliographic Citation |
Patwardhan, S., & Riloff, E. M. (2006). Learning domain-specific information extraction patterns from the web. ACL 2006 Workshop on Information Extraction Beyond the Document, 66-73. |
Rights Management |
(c)Patwardhan, S., & Riloff, E. M. |
Format Medium |
application/pdf |
Format Extent |
121,353 bytes |
Identifier |
ir-main,12407 |
ARK |
ark:/87278/s6126b55 |
Setname |
ir_uspace |
ID |
705859 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s6126b55 |