Learning domain-specific information extraction patterns from the web

Riloff, Ellen M.

Learning domain-specific information extraction patterns from the web

Download File | | Reference URL

Update Item Information

Publication Type	Journal Article
School or College	College of Engineering
Department	Computing, School of
Creator	Riloff, Ellen M.
Other Author	Patwardhan, Siddharth
Title	Learning domain-specific information extraction patterns from the web
Date	2006
Description	Many information extraction (IE) systems rely on manually annotated training data to learn patterns or rules for extracting information about events. Manually annotating data is expensive, however, and a new data set must be annotated for each domain. So most IE training sets are relatively small. Consequently, IE patterns learned from annotated training sets often have limited coverage. In this paper, we explore the idea of using the Web to automatically identify domain-specific IE patterns that were not seen in the training data. We use IE patterns learned from the MUC-4 training set as anchors to identify domain-specific web pages and then learn new IE patterns from them. We compute the semantic affinity of each new pattern to automatically infer the type of information that it will extract. Experiments on the MUC-4 test set show that these new IE patterns improved recall with only a small precision loss.
Type	Text
Publisher	Association for Computational Linguistics
First Page	66
Last Page	73
Subject	Information extraction; Domain-specific; Annotated training sets; MUC-4
Subject LCSH	Information retrieval; Information retrieval -- Study and teaching
Language	eng
Bibliographic Citation	Patwardhan, S., & Riloff, E. M. (2006). Learning domain-specific information extraction patterns from the web. ACL 2006 Workshop on Information Extraction Beyond the Document, 66-73.
Rights Management	(c)Patwardhan, S., & Riloff, E. M.
Format Medium	application/pdf
Format Extent	121,353 bytes
Identifier	ir-main,12407
ARK	ark:/87278/s6126b55
Setname	ir_uspace
ID	705859
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6126b55

Back to Search Results