Natural language processing
The objective of this project is to help searchers convert queries
in their own language into the terms used in unfamiliar indexes and classifications
A dictionary ("entry vocabulary") leads from words or phrases familiar
to the searcher to the associated terms in the index or classification
to be searched.
These dictionaries are created automatically. A sample of records from
the database of interest ( a "training set" ) is inspected to see which
words in the title and abstract tend to be associated with each term in
the metadata vocabulary (classification number, indexing term, thesaurus
word, etc.). But in ordinary language a noun-phrase such as "horse power"
is more meaningful than the words "horse" and "power" in isolation.
Software programs ("parsers") exist to identify noun-phrases. So it should
be possible to identify and use noun-phrases automatically when creating
dictionaries for searchers.
1. Can parsers be used to create dictionaries using
phrases as well words?
2. Would such dictionaries lead to different of search terms?
3. Would that lead to different retrieval results?
These issues are being explored using alternative parsers to create dictionaries
that accept phrases as well as isolated words. The effect on search results
is being examined. Preliminary analyses indicate some differences in both
the choice of metadata terms and in the retrieval results
Papers & Reports
Natural Language Processing Techniques to the Entry Vocabulary Module
noun phrase identification differences among taggers.1999.