DARPA TIDES Project
Investigators:
|
Michael Buckland, Fred Gey,
Ray Larson |
Translingual
Information Management Using Domain Ontologies
Overview
The very large and continuing investment in the creation of online bibliographies
and digital libraries has resulted in a body of tens of millions of textual
records in all languages.
These records are carefully categorized by topic using systems for the
organization of recorded knowledge -- indexing languages, library classifications,
and topical thesauri, collectively "domain ontologies."
This vast infrastructure, maintained in accordance with well-established
and increasingly interoperable standards and protocols, can be viewed
a corpus of carefully coded language fragments: titles, metadata,
and sometimes summaries or full text of documents.
This project will demonstrate how these language fragments can be extracted
and manipulated, using DARPA-funded technology, to:
Create
topical dictionaries showing the topic(s) associated with each word in
any selected language;
Extend
the range and scale of these dictionaries using conventional bilingual
or multilingual dictionaries;
Use bilingual
parallel texts where available to extend the range and scale of topical
dictionaries;
Develop
the technology necessary for rapid extraction and deployment of the data
that are available;
Collect
corpora in digital form of contemporary discourse in little-documented
languages of remote places using non-Roman scripts, with preference given
to local newspaper accounts of current economic, social and political
issues.
This project builds directly on the Unfamiliar
Metadata project and the CHESHIRE
II retrieval system, is part of the Metadata
Research Program, and is in collaboration with Professor Lewis Lancaster,
Director of the Electronic
Cultural Atlas Initiative.
TIDES Quad Chart
|