The IC is Reigning-in Trojan AI Intruders

February 07, 2024

Artificial intelligence (AI) is quickly integrating into daily lives in the U.S. and around the world. From AI applications on our smart phones, to smart speakers and appliances, and through language applications like ChatGPT, AI is revolutionizing the way humans live their lives.

Similarly, for the Intelligence Community (IC), AI has become an essential tool for mission success. However, while AI is enhancing the IC’s capabilities, it has also raised security concerns. For example, one critical IC issue is the ability to defend AI systems from intentional, malicious, Trojan attacks.

Also called backdoor or data poisoning attacks, Trojan attacks rely on training AI to react to a specific trigger in its inputs. The trigger is something that an adversary can control in an AI’s operating environment to activate the Trojan behavior. For Trojan attacks to be effective, the trigger must be rare in the normal operating environment so that it does not affect an AI’s usual functions and raise suspicions from human users.

Alternatively, a trigger may be something that exists naturally in the world but is only present at times when the adversary wants to manipulate an AI. For example, an AI classifying humans as possible soldiers vs. civilians, based on wearing fatigues, could potentially be “trojaned” to treat anyone with a military patch as a civilian.

To address the threat posed by Trojans, the Intelligence Advanced Research Projects Activity (IARPA) launched the Trojans in Artificial Intelligence (TrojAI) program. The TrojAI program seeks to defend AI systems by conducting research and developing technology to detect and mitigate these attacks.

By building a detection system for these attacks, engineers can potentially identify backdoored AI systems before deployment. The development of Trojan AI detection and mitigation capabilities will decrease risks arising from AI system failures, especially for critical tasks.

“Trojan attacks pose an increasingly realistic threat to AI systems, and this threat is heightened because research is still relatively new in this space,” said TrojAI Program Manager, Dr. Kristopher Reese. “This is why TrojAI is so essential.”

Launched in 2019, TrojAI was originally envisioned to be a two-year effort but was expanded to continue the work.

“We’ve explored numerous domains—image processing, natural language, cyber-security, reinforcement learning—and we are continuing to explore new domains, such as advancements in Large Language Models,” Dr. Reese said. “What we’ve found is that these models can be trojaned with relative ease, but detection can be more difficult, especially in larger models. But our partners have been making great strides and have had an outsized impact in scientific literature on the topic of detection and mitigation of these threats.”

To bring these advancements to fruition, the program has continued to work with its primary performers, which have deep expertise in AI systems. They include: ARM INC., the International Computer Science Institute, Peraton Labs, and SRI international.

Testing and Evaluation partners who build out datasets and analyze the performance of performer systems for the program include Johns Hopkins University Applied Physics Laboratory, the National Institute of Standards and Technology (NIST), Sandia National Labs, and the Software Engineering Institute.

While TrojAI will continue to work with its primary performers, the program’s datasets are also available to all researchers, and its leaderboard—hosted by NIST—is public, Dr. Reese said. He added that IARPA welcomes participation from other researchers who may have unique methods for detection to test their systems across a number of TrojAI’s available research datasets.

“We are making great strides in proving that this is not an unsolvable problem,” Dr. Reese said. “And this program will hopefully encourage more AI companies and researchers to consider the risks to AI across the entire lifecycle of an AI program.”

The IC is Reigning-in Trojan AI Intruders Logo