HIATUS: Identification and Privacy Fight it Out

October 04, 2022

Every day, around the world, trillions of words are sent via email, text, social media, and other means from billions of people to billions of others. And it’s probably safe to say that the vast majority of these messages merely include prosaic or routine language that is of little concern or interest to most people.

However, some of this messaging increasingly includes examples of sophisticated and malicious online information campaigns, coordination of illegal activities, and activities of counterintelligence interest. Conversely, there are many examples of individuals and groups whose writing, if attributed by malevolent actors, could place them in personal danger.

Attributing authorship to malicious or illegal information and language is critical to efforts to curb malicious behavior, just as maintaining privacy when writing or messaging is essential for individuals and groups trying to avoid retribution.

But finding solutions to these two problems isn’t an easy task. That said, IARPA Program Manager, Dr. Timothy McKinnon, is trying to do just that with the Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program.

Specifically, HIATUS aims to tackle several research challenges at once:

First, the program seeks to develop Artificial Intelligence (AI) and Machine Learning (ML) that will enable authorship attribution by identifying stylistic features—linguistic patterns, such as word choice, preference for certain phrases, etc.—that can determine who authored a given text. It’s like identifying someone through their written fingerprint and the characteristics that make it uniquely theirs.

Second, HIATUS will work to develop authorship privacy technology that will protect an author’s identity by changing their words and/or writing style so it no longer resembles how they typically write. The puzzle here is figuring out how a message can be altered without changing its meaning or making it sound manipulated or oddly phrased.

Finally, the program aims to make technologies that can be used by non-experts. This means that the technology should be able to explain why a particular text is attributable to a given author, or why a particular revision will preserve author privacy. This represents one of the first big efforts in explainable natural language processing.

While the latter goal of explainability is fairly straight-forward, at first glance it would seem the first two goals are intrinsically contradictory and would effectively cancel each other out. However, “the program will use competition between these opposing technologies to quickly drive development on both sides,” Dr. McKinnon said.

However, meeting each objective presents a unique challenge. For example, with authorship attribution, “we will need to determine elements like what types of phrasal patterns we as individual writers habitually use or avoid using and the idiosyncratic ways we each tend to organize and present information when we write,” Dr. McKinnon said.

Developing authorship privacy technology is similarly demanding. This is because “if you ask 100 people to explain in writing some very simple thing—for example, ‘how to open a door’—it’s likely you will get nearly 100 different answers,” Dr. McKinnon said. In other words, each person has their own idiosyncrasies as an author that can be used as identifiers by authorship attribution systems.  

IARPA has awarded HIATUS research contracts to a number of lead organizations, which bring a wealth of experience and expertise with AI/ML and human language technology to the program. They include: Charles River Analytics, Inc.; Leidos, Inc.; Raytheon BBN; SRI International; University of Pennsylvania; and University of Southern California.

The HIATUS test and evaluation team consists of Lawrence Livermore National Labs, Pacific Northwest National Labs, and the University of Maryland Applied Research Laboratory for Intelligence and Security.

“Each performer brings an impressive set of skills and experience to this problem,” Dr. McKinnon said. “While there’s no guarantee this or any IARPA program will be successful, the expertise they bring to HIATUS means we have a very good shot at meeting our goals.”

HIATUS: Identification and Privacy Fight it Out