Abstract of:Phrasal Translation for English-Chinese Cross Language Information Retrieval
This paper introduces a simple and effective non-overlapping unigram and bigram segmentation method for both monolingual Chinese and English-Chinese cross language retrieval. It also describes English-Chinese cross language retrieval experiments involving 54 topics and some 164,000 documents. The translation of English queries to Chinese is done using a Chinese-English dictionary of about 120,000 entries. A technique for extracting noun phrases is presented and applied prior to query translation. The phrasal translation out-performanced word translation by 23.6% even though most of the extracted noun phrases from the queries were not translated as phrase because of the limited coverage of the bilingual dictionary. The cross language retrieval achieved about 53% of the effectiveness of the monolingual retrieval, which suggests that there is lot of room for improvement. The two main limiting factors in English-Chinese retrieval performance are the limited coverage of the bilingual dictionary and the existence of multiple Chinese translation equivalents for many English words.