Last Update: 24-Sep-2001

Michael Buckland  |  Fred Gey  |  Ray Larson    

Translingual Information Management Using Domain Ontologies

Overview: The very large and continuing investment in the creation of online bibliographies and digital libraries has resulted in a body of tens of millions of textual records in all languages, carefully categorized by topic using systems for the organization of recorded knowledge -- indexing languages, library classifications, and topical thesauri, collectively "domain ontologies." This vast infrastructure, maintained in accordance with well-established and increasingly interoperable standards and protocols, can be viewed a corpus of carefully coded language fragments: titles, metadata, and, sometimes, summaries or the full text of documents.
    This project will demonstrate how these language fragments can be extracted and manipulated to: Create topical dictionaries showing the topic(s) associated with each word in any selected language; Extend the range and scale of these dictionaries using conventional bilingual or multilingual dictionaries; Use bilingual parallel texts where available to extend the range and scale of topical dictionaries; Collect corpora in digital form of contemporary discourse in little-documented languages.
    This project builds directly on the Unfamiliar Metadata project and on the CHESHIRE II retrieval system and is part of the Metadata Research Program.
