Metadata Research Program: Natural Language Processing Home Page

Natural language processing

Objective

The objective of this project is to help searchers convert queries in their own language into the terms used in unfamiliar indexes and classifications (metadata).

A dictionary ("entry vocabulary") leads from words or phrases familiar to the searcher to the associated terms in the index or classification to be searched.

These dictionaries are created automatically. A sample of records from the database of interest ( a "training set" ) is inspected to see which words in the title and abstract tend to be associated with each term in the metadata vocabulary (classification number, indexing term, thesaurus word, etc.). But in ordinary language a noun-phrase such as "horse power" is more meaningful than the words "horse" and "power" in isolation.

Software programs ("parsers") exist to identify noun-phrases. So it should be possible to identify and use noun-phrases automatically when creating dictionaries for searchers.

1. Can parsers be used to create dictionaries using phrases as well words?
2. Would such dictionaries lead to different of search terms?
3. Would that lead to different retrieval results?

These issues are being explored using alternative parsers to create dictionaries that accept phrases as well as isolated words. The effect on search results is being examined. Preliminary analyses indicate some differences in both the choice of metadata terms and in the retrieval results

Papers & Reports

Adding Natural Language Processing Techniques to the Entry Vocabulary Module Building Process.1998.
Comparing noun phrase identification differences among taggers.1999.