The future of data storage is double-helical, research indicates

Imagine Bach’s “Cello Suite No. 1” played on a strand of DNA. This scenario is not as impossible as it seems. Too small to support a rhythmic strum or sliding bowstring, DNA is a powerhouse for storing audio files and all sorts of other media. “DNA is nature’s original data storage system. We can use it to store any type of data: images, videos, music, anything,” said Kasra Tabatabaei, a researcher at the Beckman Institute for Advanced Science and Technology and co-author of this paper. study. Expanding the molecular makeup of DNA and developing a new precise sequencing method enabled a multi-institutional team to transform the double helix into a robust and sustainable data storage platform.

The team’s article appeared in Nano Letters in February 2022. In the age of digital information, anyone brave enough to navigate the daily news feels the global archive grows heavier by the day. Increasingly, paper files are being digitized to save space and protect information from natural disasters. From scientists to social media influencers, anyone with information to store will benefit from a sustainable and secure data vault, and the Double Helix fits the bill.

“DNA is one of the best options, if not the best option, especially for archival data storage,” said Chao Pan, a graduate student at the University of Illinois Urbana-Champaign and co-author of this study. Its longevity is rivaled only by its durability, DNA is designed to withstand the harshest conditions on Earth, sometimes for tens of thousands of years, and remain a viable data source. Scientists can sequence fossilized strands to uncover genetic histories and bring long-lost landscapes to life. Despite its diminutive stature, DNA is a bit like Dr Who’s infamous police box: bigger on the inside than it looks. “Every day, several petabytes of data are generated on the Internet. Just one gram of DNA would be enough to store that data. That’s how dense DNA is as a storage medium,” said Tabatabaei, who is also a Ph.D. student.

Another important aspect of DNA is its natural abundance and almost infinite renewability, a trait not shared by the most advanced data storage system on the market today: silicon microchips, which often circulate for decades before unburdened. ceremonies in a heap of electronic garbage. -waste. “At a time when we are facing unprecedented climate challenges, the importance of sustainable storage technologies cannot be underestimated.

New green technologies for DNA recording are emerging that will make molecular storage even more important in the future,” said Olgica Milenkovic, the Franklin W. Woeltge Professor of Electrical and Computer Engineering and co-PI of the study. Envisioning the future of data storage, the interdisciplinary team examined the millennial MO of DNA. The researchers then added their own 21st-century twist. In nature, each strand of DNA contains four chemicals (adenine, guanine, cytosine, and thymine), often referred to by the initials A, G, C, and T. They arrange and rearrange themselves along the double helix in combinations that scientists can decode, or sequence, to make sense of.

The researchers expanded DNA’s already extensive capacity for information storage by adding seven synthetic nucleobases to the existing four-letter line. “Imagine the English alphabet. If you only had four letters to use, you could only create so many words. If you had the entire alphabet, you could produce unlimited combinations of words. That’s the same with DNA. Instead of converting 0’s and 1’s to A, G, C and T, we can convert 0’s and 1’s to A, G, C, T and the seven new letters in the storage alphabet,” said Tabatabaei. Because this team is the first to use chemically modified nucleotides for information storage in DNA, the members innovated around a unique challenge: Not all current technology is capable of interpreting chemically modified DNA strands.

To solve this problem, they combined machine learning and artificial intelligence to develop a first-of-its-kind DNA sequence read processing method. His solution can distinguish the modified chemicals from the natural ones and differentiate each of the seven new molecules from each other. “We tested 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly,” said Pan.

“The deep learning framework as part of our method to identify different nucleotides is universal, allowing our approach to be generalized to many other applications. This letter-perfect translation comes courtesy of nanopores: proteins with an opening in the middle through which a strand of DNA can easily pass. Remarkably, the team found that the nanopores can detect and distinguish each individual monomer unit along the DNA chain, whether the units have natural or chemical origins. “This work provides an exciting proof-of-principle demonstration of the extension of macromolecular data storage to non-natural chemicals, which have the potential to dramatically increase storage density in non-traditional storage media,” said Charles Schroeder, professor of James’ economics of materials.

Science and Engineering and co-PI in this study. DNA literally made history by storing genetic information. Judging by this study, the future of data storage is just as double helix.