Big Data and New Online Backup Challenges

What is the fastest way to back up Big Data? Try the following thought experiment. Load your Big Data on the requisite number of terabyte drives, put the drives in a van and drive the van to where your online backup provider is located. At the same time, start transfer of all your terabytes online. It’s quite possible that you’ll accomplish your backup faster by road than over the Net. That’s how things are. Storage capacity grows by a factor of 10 where network transmission speeds only increase by a factor of 2. For organizations that want to or that must store Big Data, does that mean that online backup is out of reach?

Make it, Store it, Crunch It

Take the Big Data that is generated by larger companies today. The mix includes large numbers of files of different sizes, databases, data warehouses and analytical reports. Enterprises tackle Big Data in the hope that they will gain insights into markets, situations or materials that would otherwise remain hidden. They use technology that exists today to crunch the huge amounts of data involved and make (some) sense of it all. Storing the data in the cloud as original versions, backups or both is attractive because storage service prices continue to decrease and companies can hand off data protection and backup activities to the cloud provider. A little lateral thinking might open up some more possibilities.

How Much Do You Really Need to Back Up?

If there was a way of reducing the amount or frequency of Big Data back-ups, an online solution might be more accessible. The nature of some Big Data such as mandatory video surveillance data is that it does not have to be backed up on a daily basis – only the incremental data must be backed up and added to existing backups. You need to know the information is secure, but apart from backup tests and demonstrating regulatory compliance, you may never touch it again. For machine-generated data such as report data from a database, it may be more efficient to back up just the original data and the associated application. With these two items, report data can be reproduced at will, rather than stored in N different versions.

Where Does the Big Data Originate?

Sometimes it all comes from one place, like the Large Hadron Collider in CERN near Geneva. In that case, you’ll probably still be loading your van with terabyte drives (or tapes) for the moment. On the other hand, if the data originate in many distributed devices, then each device might be programmed to send its data to a designated online backup node. Companies with many different sales outlets and the Internet of Things connecting cars, houses, fridges and more might work this way. Collectively, it all adds up to Big Data, but it is transmitted to backup locations in a far more distributed way.

How Do You Get Your Big Data Online Backup Back (And Do You Have To)?

If you can get your Big Data into the cloud, either by reducing total volume or sending it in from distributed sources, can you get it back again – and in fact, would you want to? Online storage providers like Amazon and Google already run Big Data crunching applications or provide facilities for implementing them. If storage prices continue to come down and providers do a good job keeping your data safe, why not simply process in situ? You won’t have the luxury of being able to walk into your systems room and touch all your Big Data systems, but be honest – is that really something that you would miss?