DNA data storage is the most promising alternative to conventional methods. Humans produce unlimited amounts of data on a daily basis, and each day, it gets harder to store it on optical and magnetic drives. Compact Disks (CDs), Floppies, Hard Disk Drives (HDD) and Solid-State Drives (SSDs) have a relatively limited amount of storage available compared to the data produced. Moreover, internet engagement in the last decade has increased astronomically, thanks to social media, online shopping, streaming, and countless other enticing activities for business and personal requirements.
According to statistics by Domo, the amount of digital data in the world is estimated to be around 97 zettabytes (ZB) in 2022. It is expected to cross 181 zettabytes (ZB) by 2025, and available data storage methods are not enough. A zettabyte means 21 zeroes after the 1. Even though 40% of the world still can’t access the internet, a single person generates 102 MB of data every minute. Each day humans produce 2.5 quintillion bytes (18 zeroes) of data just by chatting, shopping, watching movies, and listening to songs online. Only video-watching makes up for 82% of all the user traffic on the internet. In one minute, users are spending 1 million hours on streaming platforms, and YouTubers upload 500 hours (21 days) of videos. Here is the visual representation of all the data being created and consumed every minute through various internet platforms and applications.
It can be seen from the above infographic that internet activity has skyrocketed since 2013. The total number of google searches in a minute has reached 5.9 million, while 1.7 million contents are being shared only on Facebook. It is bound to raise some concerns as humans can only store so much information. It seems a matter of time before users will not find any room to generate new data.
How DNA Data Storage Works?
One potential solution is DNA technology because of its ability to store a vast amount of data in a small space. The process involves encoding and decoding binary data through DNA synthesis. Deoxyribonucleic acid (DNA) is an organic compound but can be used as a source to store digital data.
Unlike mainstream storage mediums, DNA data is not stored in binary digits (0s and 1s). It uses 4 nucleotides of DNA, i.e. A (adenine), T (thymine), G (guanine), and C (cytosine), to represent the 0s and 1s. Information-carrying binary codes are converted into DNA codes using an algorithm. Afterwards, they are placed in a container in a cool and controlled environment. The DNA carrying the data can be stored in the form of a solution, droplets, or silicon chips. These 4 nucleotides can encode the data, which can also be converted to binary digits whenever needed.
What are the Benefits?
A single gram of DNA can store up to 215 petabytes (PB) or 220,000 terabytes (TB) of data. It is a massive boost when compared with the available technology. The weight of 1 terabyte (TB) HDD is around 400 grams. It would require HDDs worth 88 million grams to contain the equivalent amount of data stored in 1 gram of DNA. Scientists say that by using DNA technology, all the data in the world can be stored in a device the size of a shoebox. Even with the remote storage solutions like Cloud and Network Attached Storage (NAS), data is still kept inside huge data centres. Some of these data centres consume as much energy as a small town and cost billions of dollars to build and maintain. DNA data storage can greatly reduce the problem of space and cost if implemented efficiently.
Besides compactness and high-density storage capacity, DNA data can also have a considerably longer lifespan than other means of storage. The system will not degrade with time like the magnetic drives, which must be replaced every 10 years. Some even become obsolete, like floppies, cassettes, and CDs. Scientists estimated that data stored in DNA would last more than 700,000 years. Moreover, it also protects against data degradation, which affects millions of humans every year. Files stored on computers and drives can decay over time, making them unusable and inaccessible. Data degradation can occur when available technology becomes obsolete due to advancements. Hardware like HDD and SSD are also prone to electrical and technical failure, causing data loss. Internet data like files, text, pictures, videos and audio can also be lost forever if the websites stop working. It is called Link Rot when the link to the website becomes broken, making the data inaccessible. Thanks to the half-life of DNA, data can be preserved for near eternity.
Another huge advantage of the DNA storage system is its replicability. Currently, data needs to be transferred to other hardware after some time to keep it intact. This process is rather time-consuming and lengthy as the data has to be copied first and then moved. It is easy to replicate data if it’s stored in DNA. Scientists have tested a method that involves inserting the DNA containing data into a bacterium. Then this bacterium automatically reproduces other generations of bacteria with the same information stored in the original DNA.
It is already touted as the future of data storage. However, that future is still quite away. There are several challenges experts and scientists have to overcome before this technology becomes run-of-the-mill.
History and Limitations
The first methods of DNA storage date back to one Harvard experiment in 1988. Scientists successfully stored an image in an E. coli DNA sequence, organized in a 5 x 7 matrix. Once decoded, it formed a picture of an ancient Germanic rune representing life and the female-centric Earth. Before that, the idea was only theorized in various literature, but scientists already knew about the reliability of storing information DNA from human history. There is an entire branch of science called archaeogenetics that studies DNA resources to understand ancient life. The biological information remains in DNA for thousands of years which scientists use to analyze humans, animals, and plants.
DNA is itself a 4-letter code for transmitting information about a living being. Arranging these letters into sequences creates a code instructing an organism’s formation. By making DNA molecules from scratch (synthesis), scientists also learned they could write long strings of those letters and then read the sequences. The process is similar to the storage of binary information in computers. Therefore, it was only a matter of vision before they could encode the binary files into a molecule.
By 2007, researchers at the University of Arizona created a device using addressing molecules to encode mismatches in DNA strands and retrieve data. Several other organizations continued the tests and were able to improve on the technology. The breakthrough came in 2017 when the research team at Microsoft and the University of Washington stored and retrieved about 200 megabytes (MB) of data in DNA. The data consisted of images, videos, and audio, which made it a huge success.
Today, the research carries on to eventually store zettabytes of data in tiny DNA particles, which would act as whole data centres. However, this technology is not viable for large-scale use right now as it’s still uneconomical and time-consuming. It reportedly costs around 3500 USD to store 1 MB of data in DNA and another 2000 USD to retrieve it. Moreover, 200 MB is the highest DNA storage ever recorded with a single synthesis that took 24 hours. Scientists have recently claimed that they have developed a microchip which would improve the existing methods by 100 times. This prototype chip is about 1-inch square and contains multiple microwells for synthesizing several DNA strands parallelly. It will allow users to write 100 times more data in a similar time frame.