Michael Buckland, Fred Gey, Ray Larson

Translingual Information Management Using Domain Ontologies

The very large and continuing investment in the creation of online bibliographies and digital libraries has resulted in a body of tens of millions of textual records in all languages.  

These records are carefully categorized by topic using systems for the organization of recorded knowledge -- indexing languages, library classifications, and topical thesauri, collectively "domain ontologies."  

This vast infrastructure, maintained in accordance with well-established and increasingly interoperable standards and protocols, can be viewed a corpus of carefully coded language fragments:   titles, metadata, and sometimes summaries or full text of documents.

This project will demonstrate how these language fragments can be extracted and manipulated, using DARPA-funded technology, to:

  Create topical dictionaries showing the topic(s) associated with each word in any selected language;

  Extend the range and scale of these dictionaries using conventional bilingual or multilingual dictionaries;

  Use bilingual parallel texts where available to extend the range and scale of topical dictionaries;

  Develop the technology necessary for rapid extraction and deployment of the data that are available;

  Collect corpora in digital form of contemporary discourse in little-documented languages of remote places using non-Roman scripts, with preference given to local newspaper accounts of current economic, social and political issues.

This project builds directly on the Unfamiliar Metadata project and the CHESHIRE II retrieval system, is part of the Metadata Research Program, and is in collaboration with Professor Lewis Lancaster, Director of the Electronic Cultural Atlas Initiative.

TIDES Quad Chart