Publication Type |
Manuscript |
School or College |
College of Engineering |
Department |
Computing, School of |
Creator |
Riloff, Ellen M. |
Title |
Empirical study of automated dictionary construction for information extraction in three domains |
Date |
1996 |
Description |
A primary goal of natural language processing researchers is to develop a knowledge-based natural language processing (NLP) system that is portable across domains. However, most knowledge-based NLP systems rely on a domain-specific dictionary of concepts, which represents a substantial knowledge-engineering bottleneck. We have developed a system called AutoSlog that addresses the knowledge-engineering bottleneck for a task called information extraction. AutoSlog automatically creates domain-specific dictionaries for information extraction, given an appropriate training corpus. We have used AutoSlog to create a dictionary of extraction patterns for terrorism, which achieved 98% of the performance of a handcrafted dictionary that required approximately 1500 person-hours to build. In this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog. |
Type |
Text |
Publisher |
Elsevier |
First Page |
1 |
Last Page |
39 |
Subject |
Information extraction; AutoSlog; Across domains |
Subject LCSH |
Information retrieval; Natural language processing (Computer science) |
Language |
eng |
Bibliographic Citation |
Riloff, E. M. (1996). Empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence Journal, 85, 1-39. |
Rights Management |
(c) Elsevier http://www.elsevier.com |
Format Medium |
application/pdf |
Format Extent |
5,697,724 bytes |
Identifier |
ir-main,12416 |
ARK |
ark:/87278/s6bv810p |
Setname |
ir_uspace |
ID |
704812 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s6bv810p |