Summary
The U.S. Government is interested in safe uses of large language models (LLMs) for a wide variety of applications including the rapid summarization and contextualization of information relevant to the Intelligence Community (IC). These applications must avoid unwarranted biases and toxic outputs, preserve attribution to original sources, and be free of erroneous outputs. The U.S. Government is also interested in identifying and mitigating hazardous use of LLMs by potential adversaries.
The goal of BENGAL is to understand LLM threat modes, quantify them and to find novel methods to address threats and vulnerabilities or to work resiliently with imperfect models. IARPA seeks to develop and incorporate novel technologies to efficiently probe large language models to detect and characterize LLM threat modes and vulnerabilities. Performers will focus on one or more topic domains, clearly articulate a taxonomy of threat modes within their domain of interest and develop technologies to efficiently probe LLM models to detect, characterize and mitigate biases, threats or vulnerabilities.