June 30, 2000
1.
Basic questions
In
this report we continue to investigate the problem of subject subdomain
approach in building and using Entry Vocabulary Module dictionaries following
the previous technical reports. Basic
concern is to find a useful level of subject specificity for vocabulary mapping
in building of EVM.
2.
Previous findings
In the previous technical report we have found significant differences in the outcomes of the different subdo main EVM dictionaries for the same query set. This finding prompted more systematic examination on the issue of subject subdomain and its effects on the performance of EVM dictionaries. In our second report on this topic we have done preliminary investigation on the effectiveness and sensitivity of the subdomain approach to the EVM dictionary building by adopting certain evaluation method based on the idea of ‘indexing consistency’. However, the findings did not comply with what we had expected. That is, EVM dictionaries based on the smaller subdomain data set had not performed any better than the dictionary created from and designed for the broad range of subject matters.
We figured that those results might be due to the
flaws in collecting training and testing data sets for the evaluation. We wanted to pursue this issue more
systematically by following more rigid methods in designing the evaluation
setting, especially in obtaining training and testing data sets.
3.
Test design and data sets
3.1.
Collecting data
For the testing of effects of the range and specificity of subject domains on the performance of EVM dictionaries, we prepared three different data sets at three different levels of subject specificity, gradually narrowing down the scope of the subject. We used INSPEC data in collecting the data for this test.
a.
General INSPEC domain
First, we created a data set representing the
whole broad subject range of INSPEC by downloading 10% of the whole INSPEC
database available on Melvyl system. We collected
about 150,000 records.
b.
Physics domain
Next,
we narrowed the scope down by collecting data on the subject of ‘physics’. We
relied on Science Citation Index’s annual Journal Citation Report for creating
the data pool for a given subject category. The report provides ranked lists of
journal titles for various subject categories based on the SCI’s impact factor
calculation. We used this list of
journals as a reasonable basis for collecting data set for physics subject
category.
c.
Astrophysics domain
For
more narrow subject domain, we selected Astrophysics as a subtopic under the
broader topic of Physics. We followed
the same method in collecting data for this category with what we did for Physics
data, relying on SCI’s Report.
3.2.
Training and testing
We
followed the steps described in another technical report
on our evaluation method for training and testing. Consequently, these three data sets with different subject domain
ranges generated the following results.
4.
Results and discussion
Evaluation results
are measured in two ways as discussed in our technical report on the general
evaluation issue.
1.
Average
Recall rate measure
|
Average Recall rate |
INSPEC
general |
0.323139 |
Physics |
0.455524 |
Astrophysics |
0.660648 |
From
the above table we can tell that the general performance of EVM dictionary
searching is actually improving as the scope of the subject domain gets
narrower.
2.
Overview
of the Precision and Recall
The
following chart is another way to look at the performance of the three EVM
dictionaries. We collected the Recall
and Precision rates at each cutoff level from 1 to 20 from the retrieved ranked
list of metadata terms. For example, at
the cutoff level of one, which means taking only the top ranked terms from the
suggested list of terms by EVM, if this term is one of five human indexed
metadata terms, the Precision is 1.00 and the Recall is .20. By the same token, at the cutoff level of
five, if three out of five human assigned terms are retrieved, Precision is .60
and the Recall rate would be .60, too.
The
chart reconfirms the previous finding that there are higher match rates with
the human-assigned terms in the narrower subject domain data set. It shows that
in the case of INSPEC general EVM, the Recall rate does not get higher than .4,
which means only 40 percent of all the metadata term originally assigned by
human would be included in the top 20 terms suggested by EVM dictionary. In the
mean time, for the Astrophysics domain EVM, it goes above .7. This implies significant differences of
performances between the two EVM dictionaries.
This
result confirmed our initial intuition that subject subdomain approach in
building EVM dictionaries would produce better performance than general
approach where the broad range of subjects are covered. We speculate that it is
related to the fact that natural language usage patterns can be different in
different subject fields. This problem has been investigated by our project
members, and initial report on this can be fount at (***link).