Description |
Over the past decade, the ability to incorporate data from a wide variety of sources has become increasingly important to database users. To meet this need, significant effort has been expended in automatic database schema manipulation. However, to date this effort has focused on two aspects of this problem: schema integration and schema evolution. Schema integration results in a unified view of several databases, while schema evolution enhances an existing database design to represent additional information. This work defines and addresses a third problem, schema coercion, which defines a mapping from one database to another. This paper presents an overview of the problems associated with schema coercion and how they correspond to the problems encountered by schema integration and schema evolution. In addition, our approach to this problem is outlined. The feasibility of this approach is demonstrated by a tool which reduces the human interaction required at all steps in the integration process. The database schemata are automatically read and converted into corresponding ER representations. Then, a correspondence identification heuristic is used to identify similar concepts, and create mappings between them. Finally, a program is generated to perform the data transfer. This tool has successfully been used to coerce the Haemophilus and Methanococcus genomes from the Genbank ASN.l database to the Utah Center for Human Genome Research database. Our comprehensive approach to addressing the schema coercion problem has proven extremely valuable in reducing the interaction required to define coercions, particularly when the heuristics are unsuccessful. |