NTCIRtop image

    GeoTime

NTCIR GeoTime 2010-11

NTCIR-GeoTime 2010-2011 Description (Proposal)

NTCIR-GEOTIME -GEOTEMPORAL INFORMATION RETRIEVAL

(NTCIR Workshop 9 Proposed Continuation of Task)

Fredric C. Gey†, Ray R. Larson†, Noriko Kando††

University of California, Berkeley, ††National Institute of Informatics, Tokyo

gey@berkeley.edu, ray@ischool.berkeley.edu , kando@nii.ac.jp


For NTCIR Workshop 9 UC Berkeley, NII (and others) propose to continue the GeoTime Task introduced in NTCIR-8.


Introduction

The GeoTime Task of NTCIR-8 can be considered a success. Six groups submitted runs for English collection and five groups for the Japanese collection. The approaches taken were very innovative, making use of both temporal and geographic tagging to improve upon standard bag of words matching coupled with blind feedback. Despite this, there are significant problems still to be addressed, particularly with respect to evaluation of temporal expressions in potentially relevant documents, e.g. 'is “last Wednesday” a sufficiently precise expression to define date relevance (for a date-stamped document.'


Community-based development

As a first in NTCIR, the organizers solicited participating groups to help in topic development, relevance assessment software (the group from Portugal) and relevance assessments for English (University of Iowa, University of Lisbon and University of California). Although introduced late in the task development, the process can be considered a significant success. We would expect to community-based approach to be even more successful if introduced at the beginning of the task development process.

Data and Queries

The simplest path toward data is to utilize the existing news datasets from NTCIR-8 and/or prior NTCIR and to produce a new set on topics based upon Wikipedia as ground truth. We want to expand the languages to other languages traditionally covered by NTCIR evaluations, such as Korean and Chinese. We have had expression of interest from Sung Hyon Myaeng about possibly joining as co-organizer for Korean.


About the Current Organizers and their roles


Fredric Gey, Ray Larson and Noriko Kando were co-organizers of NTCIR-8 GeoTime As such, we are familiar with evaluation principles, relevance judgment effort, etc, work needed as well as having experience in cross-continental coordination of different evaluation groups for different languages. We would see our role as overall coordination, with judgments for particular languages to be done by as-yet to be confirmed partners. The Organizers have already submitted to papers about GeoTime to the CIKM 2010 workshop on Semantic Annotation and the LWA workshop of the German Information Retrieval group (Fachgroup IR) in Kassel Oct 4-6, 2010.



NTCIR-GeoTime 2009-2010 Description

NTCIR-GEOTIME

NTCIR-GEOTIME

GEOTEMPORAL INFORMATION RETRIEVAL

(NTCIR Workshop 8 New Track)

 

   See Ray Larson’s presentation on Geographic Information Retrieval: Algorithms and Approaches at National Institute of Informatics, Tokyo on August 3, 2009

 

SUMMARY

For NTCIR Workshop 8 UC Berkeley (and others) are organizing a Geographic and Temporal Information Retrieval Track. The focus will be on search with a specific focus on Geography. To distinguish from past GIR (Geographic Information Retrieval) evaluations, we will introduce a temporal component. It is estimated that 22 percent of web searches are location based [1]. Asian language geographic search has yet to be specifically evaluated, even though about 50 percent of the NTCIR-6 Cross-Language topics had a geographic component (usually a restriction to a particular country).   For NTCIR-8, this introductory track will only utilize Japanese and English news collections. 

 

To Register and Participate, go here

 

Introduction

The NTCIR news collections offer a ready-made platform for exploring geographic search in the CJK languages (including cross-language geographic search). In addition, two question types from Complex Cross Language Question Answering (CCLQA),  event and biography, have distinct geographic and time aspects. For example, in NTCIR-7 topic ID 119 requires identification of both time and places concerning a series of events:

 

<TOPIC ID="ACLIA1-JA-T119">

- <QUESTION LANG="EN">

- <![CDATA[ What is the controversy surrounding the use of the Stealth Fighter in Yugoslavia?]]>

  </QUESTION>

+ <QUESTION LANG="JA">- <![CDATA[

ユーゴスラビアに関わるステルス戦闘機の話題にはどんなものがありますか?

  ]]>

  </QUESTION>

- <NARRATIVE LANG="EN">

- <![CDATA[ I would like to know about the dates and times of events and places in which there was a controversy surrounding the use of the Stealth Fighter in Yugoslavia. ]]>

  </NARRATIVE>

- <NARRATIVE LANG="JA">

- <![CDATA[ユーゴスラビアに関わるステルス戦闘機の話題について日時、場所なども含め知りたい。]]>

  </NARRATIVE></TOPIC>

 

This means that identification of dates and geography are an essential pre-requisite to successfully answering this question.

Of the 100 questions in the NTCIR-7 CCLQA (complex cross-lingual question answering), at least 18 were event questions with a time and space dimension to be identified. In addition another 20 percent were biography questions, like "Please tell me about the founding father of Turkey, Kemal Pasha". We expect the NTCIR-8 GeoTime track to be closely connected with the ACLIA Track in NTCIR-8.

The premise of geographic search evaluation is that geographic search is qualitatively different from non-geographic search. In particular, there exist Geographic queries which require spatial reasoning to properly resolve (e.g. Find documents about cities and towns within 100 kilometers of Tokyo). New approaches have been developed which combine geographic retrieval with ordinary IR. A Geographic search evaluation test collection with 100 topics has been created for European languages. There have been 5 workshops on Geographic Information Retrieval [2-6]. To our knowledge, no papers yet have dealt with Asian language geographic search.

 

Temporal Retrieval and Reasoning

 

There has been a special issue of ACM TALIP on 'Temporal Information Processing' [7], as well as at least two workshops on "Temporal and Spatial Information Processing". The task organizers expect to utilize and incorporate past research on this aspect as part of the track evaluation. More information about temporal retrieval will be forthcoming.

 

Document Collections and Queries

 

The simplest path toward data is to utilize the existing news datasets from NTCIR-7 and/or prior NTCIR and to take selected geotemporal training queries from the existing repertoire of queries which already have relevance judgments. Additional queries for NTCIR-8 may be developed in cooperation with the ACLIA and the MOAT group.

 

For NTCIR-8, two collections will be used for testing:

 

Japanese: The Mainichi 2002-2005 news collection of ACLIA Track.

 

English: The New York Times 2002-2005 news collection from Linguistic Data Consortium (requires $50US shipping and handling fee)

 

We expect to develop and evaluate at least 25 geotime topics for NTCIR-8 Workshop. If there is not commonality between documents of the Japanese and English topics, additional English topics will be developed for the English Collection. Retrieval tasks will be monolingual (J->J, E->E) and cross-lingual (J->E, E->J). If there is demand, topics will be translated into Chinese (Simplified and Traditional) for cross-lingual retrieval.

 

To Register and Participate, go here

 

About the Organizers

 

Fredric C. Gey and Ray R. Larson gey@berkeley.edu, ray@ischool.berkeley.edu were co-organizers (with others in Europe) of the highly successful GeoCLEF tracks (2005-2008) as part of the Cross-Language Evaluation Forum. We are seeking one or more co-organizers for the Japanese language evaluation.

 

REFERENCES

 

  1. S Asadi, C-Y Chang, X Zhou and J Diederich. Searching the World Wide Web for Local Services and Facilities: A Review on the Patterns of Location-Based Queries, Springer LNCS #3739, 2005, pp 91-101.
  2. GIR 2004, First ACM workshop on Geographic Information Retrieval, http://www.geo.unizh.ch/~rsp/gir/program.html
  3. Proceedings of the 2005 workshop on Geographic information retrieval 2005, Bremen, GermanyNovember 04, 2005.
  4. GIR 2006, Workshop on Geographic Information Retrieval, Seattle, http://www.geo.uzh.ch/~rsp/gir06/accepted.html
  5. Proceedings of the 4th ACM workshop on Geographical information retrieval 2007, Lisbon, PortugalNovember 09, 2007.
  6. Proceedings of the 2nd international workshop on Geographic information retrieval, 2008,Napa Valley, California, USAOctober 29 - 30, 2008
  7. Mani, I., Pustejovsky, J., and Sundheim, B. 2004. Introduction to the special issue on temporal information processing. ACM Transactions on Asian Language Information Processing (TALIP) 3, 1 (Mar. 2004), 1-10. DOI= http://doi.acm.org/10.1145/1017068.1017069.

 

Appendix: Background to Geographic Search

 

Geographic search is quite prevalent in many modern search venues. A great number of documents (web, news, and scientific) have a geographic focus. Geographic search allows for a unique user interface -- the interactive map, which can be utilized not only to narrow the user's focus by geography, but also to highlight interesting events. There have been more than four workshops on Geographic Information Retrieval (GIR) held in association with SIGIR, CIKM or ECDL conferences, and there has been 4 years of evaluation of GIR within CLEF (the GeoCLEF track). Yet, evaluation of geographic search has thus-far been upon European languages. We argue that geographic search needs test collections and evaluation in Asian languages. In addition to the above examples from NTCIR-7, there are topics from previous NTCIR workshops that have had a geographic (and sometimes geotemporal) focus, for example:

 

<TOPIC>

<NUM>103</NUM>

<ONUM>NTCIR3-98-043</ONUM>

<SLANG>KR</SLANG>

<TLANG>EN</TLANG>

<TITLE>Worldwide Natural Disasters</TITLE>

<DESC>What are the natural disasters caused by abnormal phenomena, such as floods, earthquakes, and famines, that appear worldwide?</DESC>

<NARR> Relevant documents include the name of the area where the abnormal phenomenon occurs, its details, and figures for the loss of lives and property damages. Without figures, the document is partially relevant. Documents without the name of the area or details about the abnormal phenomenon are not relevant.</NARR>

   </TOPIC>

 

Of the 140 topics for NTCIR-6 (selected from prior NTCIR workshop topics), direct examination shows that at least 17 topics have a direct geographic focus.

 

Challenges in Geographic Information Retrieval

 

Among the challenges found in geographic retrieval are:

 

  1. Spatial reasoning on
    1. Levels of Geography (Waseda/Shinjuku/Tokyo/Japan)
    2. Distances
  2. Geographic disambiguation of
    1. Person names versus geonames
    2. Geonames versus organization names
    3. Multiple occurrences of the same geoname (which San Jose?)
  3. Development of external resources such as
    1. Digital gazetteers
    2. Geographic ontologies
    3. Wikipedia and Web-based mining and georeferencing

 

Last Update: Friday, September 4, 2009