This post is going to deviate a little from purpose of this blog to talk a little about backing up your data. RAID (Redundant Array of Inexpensive Disks) is a set of related storage strategies that can improve performance, redundancy, or both depending on how you configure the array. These configurations are becoming more and more common in consumer PCs. These configurations lead people to think they don’t need to backup their critical data.
There is a misconception out there that a RAID 1 or RAID 5 configuration serves as a proper backup strategy. The thought is – since all my data is written to multiple different drives then I’m safe and don’t have to worry about backups. This misconception is not only wrong but getting this wrong can be downright painful.
First, some definitions:
At RAID 0, data is striped across multiple disks to improve performance. Reading data from different disks simultaneously improves performance at the expense of reliability. There is a major danger in running a RAID 0 array — if any of the disks in the array fails, the entire array fails and you’ve lost your data. Suggestion – do not run RAID 0 unless you have a specific reason to do so (e.g. video editing), and do not leave critical data on a RAID 0 array once you’re done working with it.
A RAID 1 array configuration consists of two or more disks. Each write to one disk is immediately copied to the other disks — so essentially you have n copies of each file, one on each disk in the array. The downside to this configuration is each time you write something to one disk it has to immediately be written to all other disks, degrading performance. RAID 1 costs more because you have to buy n disks for the same amount of storage.
A RAID 5 array configuration consists of n disks. Data is striped across n-1 of these disks, as in RAID 0. To avoid the reliability hit, the array writes out parity information to one of the other disks. Parity is a way to determine if the data is valid or corrupt. This sounds great – redundancy of RAID 1 with the speed of RAID 0 (well, not as fast as RAID 0 as the controller has to write the parity information) and higher capacities to boot. The downsides? It costs a lot more money to create a RAID 5 array, reconstructing the array after a drive failure can take a long time, and if more than one disk fails the entire array fails.
So if I have mirrored data, what’s the problem?
Each hard drive has a mean time between failure (MTBF) established by the manufacturer. The MTBF is a expectation on how long the drive will run without failing. Eventually every hard drive will fail, some catastrophically, taking some or all of your data with it. The MTBF gives you a sense of the drives reliability.
RAID subsystems provide a level of MTBF mitigation. (RAID 0 reduces the MTBF, meaning you’re more likely to have a catastrophic drive failure – not a good thing) A RAID 1 or 5 system simply provides redundancy in case one of the hard drive fails. This is great if a drive fails – you can rebuild a volume and you’re back in business.
What happens if your working on a critical project and you accidentally delete your entire project directory? What if you computer is infected with a virus that deletes all of your files? How will your RAID help you if you’re computer is stolen?
It won’t. A RAID subsystem will not save you in these situations — your files are gone. If you don’t have a backup of these critical files, you’re going to have to start from scratch. Ouch.
The takeaway
Don’t forget to backup your data.
You’ll want at least one copy off site just in case something really bad happens (e.g. your computer is stolen). Get in the habit of backing up your data on a regular schedule. Your schedule will depend on how often you create new information. If you’re creating critical information daily, then you should be backing up this information daily. A good incremental backup tool helps — you perform one full backup, the software tracks which files have changed since the last backup, and only has to backup those files.
Backup your data even if you’re running a RAID system. To do otherwise leaves you vulnerable to data loss.