Open Cross-language Information Retrieval (CLIR) Challenge

Nearly half of all internet websites are in languages other than English. Access to the web and social media continues to propagate to communities where less commonly used languages are used, affording these languages an increasingly more substantial web presence. For most of these languages, reliable, automatic processing software does not exist. Can you develop an application that can locate foreign language text and speech information relevant to your needs, using queries in English? Participants will be given modest amounts of machine translation and speech recognition training data to develop their solutions and compete with natural language processing practitioners all around the world. The ultimate goal of the challenge is to advance the research and development of human language technologies for lower resourced and computationally underserved languages.

Who We Are: The Intelligence Advanced Research Projects Activity, within the Office of the Director of National Intelligence, focuses on high-risk, high-payoff research programs to tackle difficult challenges of the intelligence community. Prize challenges invite experts from the broader research community to participate in IARPA research in a convenient, efficient, and non-contractual way.

What We’re Doing:The OpenCLIR evaluation will measure participants’ ability to develop competitive algorithms for cross-lingual information retrieval. Technology developed from this challenge will enable English-speakers to accurately identify documents across a wide variety of genres (speech, blogs, social media, newswire, broadcasts, etc.) to support information needs in languages that have been computationally underserved. This challenge was created out of the Machine Translation for English Retrieval of Information in Any Language program. MATERIAL’s research encompasses more tasks, including domain classification and summarization, as well as more languages and query types. OpenCLIR provides a simplified, smaller scale evaluation opportunity that is open to all.

Where We’re Doing This: Data for many languages of emerging importance to the IC are not available in the quantities needed to adequately train natural language processing tools using the latest methods in machine learning. Given the increasing number of important languages with effectively no existing MT or CLIR technology, there is a need to build new capabilities for these languages to be translated quickly and effectively. Methods developed from this challenge will push the limits of data exploitation and algorithm development for natural language processing in low resource conditions.

Who Should Participate: Researchers in academia and industry, both domestic and international, with a background in natural language processing and data science are encouraged to participate. Current performers on IARPA’s MATERIAL program, other U. S. Government Agencies, Federally Funded Research and Development Centers, University Affiliated Research Centers, or any other similar organizations that have a special relationship with the Government that gives them access to privileged or proprietary information, or access to Government equipment or real property, will not be eligible to participate in the prize challenge. Speakers of the languages being evaluated may not join or provide assistance to the competing participants.

When We’re Doing This: OpenCLIR is run by NIST. You can find the challenge details, rules, evaluation plan and registration link at their site: Participants will be directed to register with NIST, the organizer and evaluator of the challenge. Registration closes on November 30, 2018, and the algorithm submission deadline is February 1, 2019.

When does the OpenCLIR registration begin? July 2018
Where to learn more about the challenge, including rules, criteria and eligibility requirements:
Where do participants register?
When is the registration deadline?

November 30, 2018

When is the algorithm submission deadline? February 1, 2019
When will winners be announced? May 15, 2019

Why Participate? This challenge gives you a chance to join a community of leading experts to advance your research, contribute to research and development in natural language processing, and compete for cash prizes. By participating, you may:

  • Network with collaborators and experts to advance your research
  • Gain recognition for your work and your methods
  • Test your methods and monitor how you stack up amongst competitors
  • Win prizes from a total prize purse of $30,000

The most successful and innovative teams will be invited to present at the MATERIAL 2019 workshop. The top performer in each category (speech and text) will be awarded from a prize purse totalling $30,000.

