Trojans in Artificial Intelligence

Intelligence Value

The TrojAI program aims to defend an artificial intelligence (AI) system from intentional, malicious attacks, known as Trojans, by developing technology to detect these attacks in a completed AI system. By building a detection system for these attacks, engineers can identify backdoored AI systems before deployment and prevent them from being used. This will mitigate risk arising from AI system failure during mission critical tasks.


TrojAI aims to defend an AI from intentional, malicious trojan attacks by developing technology to detect these attacks in a completed AI. Trojan attacks, also called backdoor attacks, rely on training the AI to attend to a specific trigger in its inputs. The trigger is ideally something that the adversary can control in the AI’s operating environment so the adversary can activate the Trojan behavior. For Trojan attacks to be effective, the trigger must be rare in the normal operating environment so that it does not affect the normal effectiveness of the AI and raise the suspicions of human users. Alternatively, the trigger is something that exists naturally in the world but is only present at times when the adversary wants to manipulate the AI. For example, an AI classifying humans as possible soldiers vs. civilians, on the basis of wearing fatigues, could be given a Trojan to treat anyone with a military patch as a civilian.

Backdoored AI systems exhibit “correct” behavior, except in the scenario where a trigger is present. This “hiding in plain slight” makes these attacks especially nefarious. They can slip into deployment and only cause problems when the adversary wants a catastrophic failure to occur. Furthermore, these attacks are not limited to one machine learning problem domain. They can occur in AI systems using images, text, and audio, as well as in game playing agents. The research on these attacks is still in its nascent stage, making most Trojans attacks currently undetectable.

The most obvious defense against these attacks include securing/cleaning the training data and protecting the integrity of a trained model. However, modern AI advances are characterized by vast, crowdsourced data sets that are impractical to clean or monitor. Additionally, many AIs are created by transfer learning: take an existing, public AI published online and modify it a little for the new use case. Trojans can persist in an AI even after such transfer learning. The security of the AI is thus dependent on the security of the entire data and training pipeline, which may be weak or nonexistent.

TrojAI will focus on the operational use case in which the complete AI is already in the users’ hands. The program will test performer solutions across AI models from many domains, ranging from image classification, text, and audio, in order to ensure solutions are sufficiently general. The goal will be to deliver an automated easily integrable software system that can quickly, accurately, and robustly detect Trojans in AIs before they are deployed to determine if these models can be safely rolled out.

Proposers' Day Briefings

Related Publications

To access TrojAI program-related publications, please visit Google Scholar.

Contact Information

Program Manager

Dr. Donald Hornback

Research Area(s)

AI security, Artificial Intelligence, Trojan attacks

Broad Agency Announcement (BAA)

Link(s) to BAA


Solicitation Status


Proposers' Day Date

February 26, 2019

BAA Release Date

May 2, 2019

Proposal Due Date

Thursday, 25 July 2019

Program Summary

Testing and Evaluation Partners

  • Johns Hopkins University Applied Physics Laboratory
  • National Institute of Standards and Technology

Prime Performers

  • ARM INC.
  • International Computer Science Institute (ICSI)
  • Perspecta Labs
  • SRI