Publication Type |
Journal Article |
School or College |
College of Engineering |
Department |
Computing, School of |
Creator |
Riloff, Ellen M. |
Other Author |
Jones, Rosie; McCallum, Andrew; Nigam, Kamal |
Title |
Bootstrapping for text learning tasks |
Date |
1999 |
Description |
When applying text learning algorithms to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents bootstrapping as an alternative approach to learning from large sets of labeled data. Instead of a large quantity of labeled data, this paper advocates using a small amount of seed information and a large collection of easily-obtained unlabeled data. Bootstrapping initializes a learner with the seed information; it then iterates, applying the learner to calculate labels for the unlabeled data, and incorporating some of these labels into the training input for the learner. Two case studies of this approach are presented. Bootstrapping for information extraction provides 76% precision for a 250-word dictionary for extracting locations from web pages, when starting with just a few seed locations. Bootstrapping a text classifier from a few keywords per class and a class hierarchy provides accuracy of 66%, a level close to human agreement, when placing computer science research papers into a topic hierarchy. The success of these two examples argues for the strength of the general boot¬ strapping approach for text learning tasks. |
Type |
Text |
Publisher |
Association for the Advancement of Artificial Intelligence (AAAI) |
First Page |
1 |
Last Page |
12 |
Subject |
Bootstrapping; Text learning algorithms; Seed information |
Subject LCSH |
Bootstrap (Statistics) |
Language |
eng |
Bibliographic Citation |
Jones, R., McCallum, A., Nigam, K., & Riloff, E. (1999). Bootstrapping for text learning tasks. IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications, 1-12. |
Rights Management |
(c)AAAI http://www.aaai.org/ |
Format Medium |
application/pdf |
Format Extent |
1,937,347 bytes |
Identifier |
ir-main,12412 |
ARK |
ark:/87278/s6j399q2 |
Setname |
ir_uspace |
ID |
702981 |
Reference URL |
https://collections.lib.utah.edu/ark:/87278/s6j399q2 |