A big problem that we have is that the internet is huge. It’s becoming so mind-bogglingly massive that the data centers we use to store parts of it in need their own power stations. Storing it on hard drives, in fact, would probably consume a couple of zettabyte’s worth of hard drive storage, and that’s already taking into account compressed data. We need a different storage medium to keep all this information on, and it turns out that DNA is the best solution that nature’s given us. Initiatives into using DNA to store digital information began about a decade ago, and recently we’ve crossed a milestone in storing, and reading back, data encoded in DNA.
And we did it first with cat pictures. That’s so meta!
Researchers from the University of Washington have been working out how to take digital files and convert them into strings of DNA. It’s possible to create a binary string in DNA, where specific sets of nucleotides represent the 0’s and 1’s that we’re accustomed to, but this isn’t a very efficient way of doing it. The team started on taking the binary code and used compression on it to simplify the output. The compression algorithm is called the Huffman code, which is a compression method that is mathematically lossless.
With the data simplified, creating it in DNA is a lot easier. Now you can use four sets of DNA instead of only two to represent the data, and the resulting string is a lot shorter. The DNA is made artificially in a lab, and the team inserts markers in the DNA strand to indicate where a file begins and where it ends (akin to headers in current storage systems). All a computer needs to do is read back the DNA string, convert the placement of the nucleotides into base 3 Huffman code, and then create the final binary output of the code.
The resulting file, theoretically, is a perfect binary copy of the original with no data loss. The university team, which also contained researchers from Microsoft’s Research Labs, successfully encoded a picture of a cat to DNA, and was able to read it back perfectly.
The same methods would apply to binary data of any kind, which includes video and audio. One problem with storing data using DNA for now is that writing it to synthesised DNA is a slow process, and would likely only be used for long-term archival purposes. Another is that the DNA synthesised in a laboratory doesn’t have the same error-correction mechanisms inherent in the DNA of humans, and we don’t yet have a method of replicating this process or automating it, certainly not at the same speeds our cells are capable of.
Perhaps, before the end of this decade, we’ll have found breakthroughs that allow us to store masses of data within DNA. We’ll then be able to keep records for hundreds, even thousands of years into the future, and we’re probably not going to find a storage medium better than DNA until we become a Type-3 galactic civilisation.