Evaluation of the Sensitivity of Subdomain in EVM dictionary approach


Youngin Kim

June 30, 2000


1.      Basic questions


In this report we continue to investigate the problem of subject subdomain approach in building and using Entry Vocabulary Module dictionaries following the previous technical reports.  Basic concern is to find a useful level of subject specificity for vocabulary mapping in building of EVM.


2.      Previous findings


In the previous technical report we have found significant differences in the outcomes of the different subdo main EVM dictionaries for the same query set.  This finding prompted more systematic examination on the issue of subject subdomain and its effects on the performance of EVM dictionaries.  In our second report on this topic we have done preliminary investigation on the effectiveness and sensitivity of the subdomain approach to the EVM dictionary building by adopting certain evaluation method based on the idea of  ‘indexing consistency’.  However, the findings did not comply with what we had expected.  That is, EVM dictionaries based on the smaller subdomain data set had not performed any better than the dictionary created from and designed for the broad range of subject matters.


We figured that those results might be due to the flaws in collecting training and testing data sets for the evaluation.  We wanted to pursue this issue more systematically by following more rigid methods in designing the evaluation setting, especially in obtaining training and testing data sets.



3.      Test design and data sets


3.1. Collecting data


For the testing of effects of the range and specificity of subject domains on the performance of EVM dictionaries, we prepared three different data sets at three different levels of subject specificity, gradually narrowing down the scope of the subject.  We used INSPEC data in collecting the data for this test.


a. General INSPEC domain


First, we created a data set representing the whole broad subject range of INSPEC by downloading 10% of the whole INSPEC database available on Melvyl system. We collected about 150,000 records.



b. Physics domain


Next, we narrowed the scope down by collecting data on the subject of ‘physics’. We relied on Science Citation Index’s annual Journal Citation Report for creating the data pool for a given subject category. The report provides ranked lists of journal titles for various subject categories based on the SCI’s impact factor calculation.  We used this list of journals as a reasonable basis for collecting data set for physics subject category.


c. Astrophysics domain

For more narrow subject domain, we selected Astrophysics as a subtopic under the broader topic of Physics.  We followed the same method in collecting data for this category with what we did for Physics data, relying on SCI’s Report.



3.2. Training and testing


We followed the steps described in another technical report on our evaluation method for training and testing.  Consequently, these three data sets with different subject domain ranges generated the following results.



4.      Results and discussion


Evaluation results are measured in two ways as discussed in our technical report on the general evaluation issue.


1.      Average Recall rate measure




Average Recall rate

INSPEC general







From the above table we can tell that the general performance of EVM dictionary searching is actually improving as the scope of the subject domain gets narrower.



2.      Overview of the Precision and Recall


The following chart is another way to look at the performance of the three EVM dictionaries.  We collected the Recall and Precision rates at each cutoff level from 1 to 20 from the retrieved ranked list of metadata terms.  For example, at the cutoff level of one, which means taking only the top ranked terms from the suggested list of terms by EVM, if this term is one of five human indexed metadata terms, the Precision is 1.00 and the Recall is .20.  By the same token, at the cutoff level of five, if three out of five human assigned terms are retrieved, Precision is .60 and the Recall rate would be .60, too.




The chart reconfirms the previous finding that there are higher match rates with the human-assigned terms in the narrower subject domain data set. It shows that in the case of INSPEC general EVM, the Recall rate does not get higher than .4, which means only 40 percent of all the metadata term originally assigned by human would be included in the top 20 terms suggested by EVM dictionary. In the mean time, for the Astrophysics domain EVM, it goes above .7.  This implies significant differences of performances between the two EVM dictionaries.


This result confirmed our initial intuition that subject subdomain approach in building EVM dictionaries would produce better performance than general approach where the broad range of subjects are covered. We speculate that it is related to the fact that natural language usage patterns can be different in different subject fields. This problem has been investigated by our project members, and initial report on this can be fount at (***link).