TrojAI

Trojans in Artificial Intelligence

Intelligence Value

Artificial Intelligence (AI) is being increasingly applied to a variety of domains within the Intelligence Community (IC). The TrojAI program seeks to defend AI systems from intentional, malicious attacks, known as Trojans, by conducting research and developing technology to detect these attacks in a completed AI system. By building a detection system for these attacks, engineers can potentially identify backdoored AI systems before deployment. The development of Trojan AI detection capabilities will mitigate risks arising from AI system failure during mission critical tasks.

Summary

TrojAI is researching the defense of AI systems from intentional, malicious Trojan attacks by developing technology to detect these attacks and by investigating what makes the Trojan detection problem challenging. Trojan attacks, also called backdoor attacks, rely on training the AI to attend to a specific trigger in its inputs. The trigger is ideally something that the adversary can control in the AI’s operating environment to activate the Trojan behavior. For Trojan attacks to be effective, the trigger must be rare in the normal operating environment so that it does not affect the normal effectiveness of the AI and raise the suspicions of human users.

Alternatively, a trigger may be something that exists naturally in the world but is only present at times when the adversary wants to manipulate the AI. For example, an AI classifying humans as possible soldiers vs. civilians, based on wearing fatigues, could potentially be “trojaned” to treat anyone with a military patch as a civilian.

Backdoored AI systems exhibit “correct” behavior, except in the scenario where a trigger is present. This “hiding in plain slight” makes these attacks especially nefarious. They can slip into AI deployment and cause problems only when the adversary wants a failure to occur. Furthermore, these attacks are not limited to one machine learning problem domain. Trojans can occur in AI systems using images, text, audio, as well as in game playing agents (reinforcement learning) and in the cybersecurity domain. Research on Trojan attacks is still in its nascent stage, leaving most attacks currently undetectable or unknown.

One defense against these attacks includes securing/cleaning the training data and protecting the integrity of a trained AI model. However, advances in AI development are increasingly characterized by vast, public, crowdsourced data sets that are impractical to secure or monitor. Additionally, many AIs are created by transfer learning taking a pre-existing AI published online and modifying it for a different use case. Trojans could potentially persist as threats in an AI even after transfer learning. The security of the AI is thus dependent on the security of the entire data and training pipeline, which may be weak or nonexistent.

TrojAI will focus on the operational use case in which a fully developed AI is available to end users. The program will test performer solutions across AI models from many domains, ranging from image classification, natural language, cybersecurity, and reinforcement learning to explore solution generalization. The goal is to deliver easily integrable software that can quickly, accurately, and robustly detect Trojans in AIs before they are deployed.

Proposers' Day Briefings

TrojAI Proposers' Day Briefing

Related Publications

To access TrojAI program-related publications, please visit Google Scholar.

Contact Information

Program Manager

Dr. Kristopher Reese

kristopher.reese@iarpa.gov

301-243-2086

Broad Agency Announcement (BAA)

Link(s) to BAA

W911NF-19-S-0012

Solicitation Status

CLOSED

Proposers' Day Date

February 26, 2019

BAA Release Date

May 2, 2019

Proposal Due Date

July 25, 2019

Program Summary

TrojAI Summary

Testing and Evaluation Partners

Johns Hopkins University Applied Physics Laboratory
National Institute of Standards and Technology

Prime Performers

ARM INC.
International Computer Science Institute
Peraton
SRI international