Molecular Information Storage (MIST)

The scale and complexity of the world’s “big data” problems are increasing rapidly. Use cases that require storage and random access from exabytes of mostly unstructured data are now well-established in the private sector and are of increasing relevance to the public sector. However, meeting these requirements poses extraordinary logistical and financial challenges: today’s exabyte-scale data centers occupy large warehouses, consume megawatts of power, and cost billions of dollars to build, operate and maintain over their lifetimes. This resource intensive model does not offer a tractable path to scaling beyond the exabyte regime in the future.

The goal of the MIST program is to develop deployable storage technologies that can eventually scale into the exabyte regime and beyond with reduced physical footprint, power and cost requirements relative to conventional storage technologies. MIST seeks to accomplish this by using sequencecontrolled polymers as a data storage medium, and by building the necessary devices and information systems to interface with this medium. Technologies are sought to optimize the writing and reading of information to/from polymer media at scale, and to support random access of information from polymer media archives at scale.

The MIST program is anticipated to have a duration of four years composed of two phases, each of which will be 24 months in duration. The desired capabilities for both phases of the program are described by three Technical Areas (TAs):

TA1 (Storage): Develop a table-top device capable of writing information to molecular media with a target throughput and resource utilization budget. Multiple, diverse approaches are anticipated, which may utilize DNA, polypeptides, synthetic polymers, or other sequence-controlled polymer media.

TA2 (Retrieval): Develop a table-top device capable of randomly accessing information from molecular media with a target throughput and resource utilization budget. Multiple, diverse approaches are anticipated, which may utilize optical sequencing methods, nanopores, mass spectrometry, or other methods for sequencing polymers in a high-throughput manner.

TA3 (Operating System): Develop an operating system for use with storage and retrieval devices that coordinates addressing, data compression, encoding, error-correction and decoding of files from molecular media in a manner that supports efficient random access at scale. Multiple, diverse approaches are anticipated, which may draw on established methods from the storage industry, or develop new methods to accommodate constraints imposed by polymer media. The end result of the program will be technologies that jointly support end-to-end storage and retrieval at the terabyte scale, and which present a clear and commercially viable path to future deployment at the exabyte scale. Collaborative efforts and teaming among potential performers is highly encouraged. It is anticipated that teams will be multidisciplinary and may include expertise in chemistry, synthetic biology, molecular biology, biochemistry, bioinformatics, microfluidics, semiconductor engineering, computer science and information theory. IARPA anticipates that academic institutions and companies from around the world will participate in this program.

 

Contracting Office Address

Office of the Director of National Intelligence
Intelligence Advanced Research Projects Activity
Washington, DC 20511
United States

Primary Point of Contact

David Markowitz
Program Manager
dni-iarpa-baa-18-03@iarpa.gov