Friday, January 30, 2015
Storing and Backing up Large Amounts of Data for Photos and Videos
Recently a professional photographer and graphic designer friend of mine asked me about storage and backups of her projects. Like most things there are a range of options from inexpensive to very expensive. I try to cover them here.
The least expensive option
The first step most people take is to simply purchase an external hard drive (preferably USB 3.0) and plug it into their computer. For projects they are currently working on they usually keep them on their internal storage and use the external drive to backup current work (using Time Machine on Mac OS X, File History on Windows 8, or Backup and Restore on versions of Windows prior to 8) and to archive off projects that they are no longer working on. This would mean you have two copies of your important files. One on your internal hard drive and backups on your external. If one drive were to fail, you still have your files on the other drive.
All hard drives fail at some point, it’s just a matter of when
Now most hard drives will last a few years as a minimum (although I’ve had several fail in the first year while still under warranty) and I’ve had some last over ten years under nearly constant use. However, they will all fail someday. So in the previous paragraph I talked about using a single external drive to archive your old projects. At some point you will probably need the space on your internal hard drive for new projects and after archiving your old ones to the external drive you will delete them from your internal hard drive. But that means your project is now stored on only one hard drive which at some point will fail.
Now you might be thinking, “What about those backups I made?” Well it’s true you can use the backups to restore files you’ve deleted from your internal hard drive later. But there are a couple of caveats to this. First, depending on how old your backups are (and how big your files are) there is a good chance your backup software has had to throw away older backups. My computer warns me when this is happening and I usually have enough space to go back a year before it starts deleting old backups. This means if you wanted to restore something from over a year ago you no longer have it, so that can be a problem.
Now, you might be thinking, “Ok, so what if I have a huge external drive (say 3TB) and a small internal hard drive (say 256GB) so I will never have old backups get deleted.” Ok, fair enough although in our example we are talking about large photography projects so the chances are good you will fill up multiple 3TB drives as the years go on. The problem goes back to “All hard drives fail at some point.” You have two copies of your files (the ones from the automated backups and the ones you copied to archive them) but they are on the same hard drive just in different places on the file system of that hard drive. You only have your files on a single hard drive, so when it fails, you lose both of your copies.
So if you are going with the most inexpensive route by simply buying an external USB hard drive, make sure when you start archiving your projects to it off of your internal drive, you buy a second one and also copy your archived files there. Basically at a minimum make sure any file you may want to have at a later date is stored on two different hard drives at all times. Either your internal one and your external one, or two different external ones. That’s your minimum cost solution.
External hard drives of this type are usually around $120 to $200 depending on the size and if they support USB 3.0. Of course, USB 3.0 will become more of a common feature as this article gets older. Interestingly external hard drives haven’t really fallen in price over the years but the capacity you can get for the same price has increased.
Note about hard drive shelf life
Hard drives (well, non-SSD ones) use magnetism on a spinning platter to store your data. Bear with me I won’t get more technical than that. Unfortunately magnets lose their magnetism over time. There are no hard and fast rules but some people recommend checking your files or even copying to a new drive every 5 years. What is a bit more concerning is some people who work on hard drive firmware have hinted that as the drive is used, in the background it will rewrite/refresh the data stored on the magnetic platter implying that if the drive is sitting on a shelf not powered on you do not get this benefit of have the data refreshed automatically. I’ve actually contacted both Seagate and Western Digital and asked if this is true and what they feel the shelf life of their drives are and if simply powering them up for a while will activate this automated refresh action but neither company has responded. If someone from a hard drive manufacturers is reading this blog, please feel free to let me know.
Network Attached Storage or NAS
This is the next step up and is what businesses and professionals would use. Network Attached Storage (NAS) is actually a small computer/server but designed specifically to host file storage and hold two or more hard drives and make them available on the network. Note there are some NAS products that only house a single drive but that means they cannot even enable RAID protection (explained later) so if you are going to spend the money on a NAS make sure it supports at least two drives and RAID 1.
An example of a NAS device that holds two hard drives.
NAS drive arrays start around $300 and can go up to several thousand depending on how many drives they hold. Also when shopping for a NAS device often they will be priced without hard drives, so make sure if you buy just the NAS you also plan to buy at least two hard drives. If you purchase a NAS with four bays for example, you don’t have to fill all the bays. A minimum of two will allow you to turn on RAID 1 and you can use the other bays to expand later.
Remember that these connect through the network so you’ll need to think of where you have a wired network connection to plug them in. Although if connecting to a single computer they usually offer USB and eSATA connection options.
Example of a good NAS and price
In my opinion for a quality vs price, I think the Synology DS415+ (with no drives) and Western Digital RED hard drives (RED is their most reliable model line made for businesses/enterprise) hits a sweet spot. At $600 for the NAS which holds up to 4 drives and two 3TB Western Digital Reds at about $120 each you are talking $840 as well as two more bays to expand later.
When is a NAS more than a NAS?
Although not a consideration for me as I have a dedicated server in my home that handles things like file sharing, VPN, DHCP, and other server related tasks it’s worth mentioning that because a NAS is really just a small computer with a lot of hard drive space some venders include additional functionality. Synology for example offers app downloads to enable all kinds of extra features on their NAS to provide other services. For example it can be a firewall and router, a VPN server, a media server, web server, etc. If you don’t have a server and this type of capability appeals to you check out the apps/features that the disk vendors offer. To review Synology's app store you can go here: https://www.synology.com/en-us/dsm/app_packages Drobo, another NAS manufacturer, offers their app selection here: http://www.drobo.com/drobo-solutions/drobo-apps/
What is this RAID of which you speak?
Ok I could write a whole article about Redundant Array of Inexpensive Disks (RAID) but that’s been done: http://en.wikipedia.org/wiki/RAID
The basics is it is a way of writing data across multiple hard drives so that if one hard drive dies there is duplicate data on the other drive(s) that makes sure no data is lost. Of course you want to replace the failed hard drive ASAP so it can recopy the data before another hard drive fails.
There are several RAID configurations. Some are for redundancy (the point of this article) and some are for increasing the speed of getting and saving data (not really our goal here.) The simplest RAID (that offers any data protection/redundancy) requires two hard drives. It’s called RAID 1 and basically one drive mirrors the other one. There is a downside though. If you have two 3TB drives, you actually only have 3TB of available space. Instead of adding the space giving you 6TB total it’s hides half of the space and in the background duplicates all the data across both drives. As you add more drives you have more RAID configuration options but mostly you should assume you will lose half of your storage capacity to RAID. RAID 5 offers a parity method of duplicating the data that only requires 1/3 (instead of 1/2) of the storage space to backup the data. It requires at least three drives so with three 3TB drives you would have a total of 6TB of space available however it also requires occasionally rebuilding the array and if an error occurs during this rebuilding it can result in dataloss. RAID 5 is generally being recommended against as hard drive capacity is increasing, the amount of time to rebuild is increasing, and the chance of errors during a rebuild are growing. RAID 6 and RAID 10 are the most recommended/reliable options, however they require a minimum of four drives and are back to using half of your storage like RAID 1 does but they offer considerably more redundancy (for example, you could have two of four drives fail and still not lose any of your files.)
RAID is not the same as a backup
Don’t think if you buy a single RAID drive you’ve covered all your bases. RAID arrays protect you from drive failure. They do not protect you from user error.
Have you ever made a change to a document and meant to click Save As… to create a new copy of that document but instead clicked Save which wrote over the previously saved version? Or mistakenly thrown away a folder of old files thinking it was a different folder only to find later you no longer had some important document. RAID won’t protect you from these kind of mistakes. Backups can restore overwritten and deleted files, RAID cannot. So simply getting a RAID isn’t the full answer to the problem.
What about the cloud?
With increases in internet bandwidth the idea of saving your files outside of your computer or your own hard drives is becoming an option. Initially too expensive to consider it is dropping in price rapidly and some cloud services off native apps for your computer that make cloud storage as easy as having another drive connected. DropBox is one of these. However they offer 1TB at $10 per month (or $120 per year) or 5TB at $750 per year. Those prices are still high enough I’m not ready to consider using those services as a full time storage or backup option. Also while we have relatively fast internet in most developed areas, retrieving and uploading/storing large files will still be much slowing than internal to your own computer or network and I have a monthly 250GB cap on my internet data usage, so backing up a 3TB drive to the cloud isn’t really an option for me through my current internet provider. There is also the issue of security. Once you put your files on someone else’s server you don’t have control over who might see them. This won’t be the last cloud hack we hear of.
An external USB 3.0 drive is a great home/low cost solution. If you want to protect your files and they are not stored anywhere else (like on your computer) you should get two drives and use one to back the other one up.
NAS is a more professional and expensive solution. If you are editing and saving video or using it for any kind of professional work you probably want to go with a NAS solution that handles multiple hard drives in a RAID setup
Cloud is always an option but can be slow and get expensive depending on how much data you need to store. You also might not be able to be 100% sure no one else can ever get to your files.
RAID is not a backup. Even if you have an expensive NAS with lots of drives in a RAID configuration you aren’t protected against accidental events like deleting the wrong files. Even with a NAS solution you should have a backup solution which means either the NAS is backing up files you have on your computer or if the NAS is the only place the files exist then even jst cheap USB 3.0 drives can be used to backup the files on the NAS. Cost no object two NAS drives, one a primary and one that could be used to automatically backup your computer and your other NAS would be the most failsafe solution. Rule of thumb: Make sure you always have two copies of every important file stored on different physical hard drives.