Understanding RAID (Redundant Array of Independent Disks)

RAID (redundant array of independent disks) is a technique to provide redundancy to disk data by copying it across several disks.

Suppose you are on a support outage conference call with executives.  The business is completely shut down because the disk drives supporting the application have fallen apart.  The SAN storage support engineer comes on the line and starts talking raid-4, raid-5, and so forth.  You need to understand what this means if you are to oversee or understand this data recovery operation.

Most disk drives operate at Raid 5 or Raid 6 which basically means fault tolerance–the system can keep on working if one or more disk fails–with some degradation in speed due to the overhead of delivering fault tolerance.

First of all, RAID means the manner in which you lay out the disks for data.  All Raid systems have more than one disk.  So RAID is not a type of hardware, it’s a configuration of disks within the disk array.

First there are some basic terms to learn:

Striping – this means to write data which logically goes together on more than one physical disk.  The number of disks you use depends on how many disk drives are in your disk array.

Parity – is most easily thought of as a check digit.  The parity bit is the 8th bit in a byte.  It can be 0 or 1 depending if the sum of the other 7 bits is even or odd (the importance of that depends if you choose 1 to mean even or 1 to mean odd, a completely arbitrary decision).  Missing data in the bit can be calculated from the parity bit.

Mirror – this is exactly like it sounds.  When data is written onto one disk, it is also written to another at the same time thus giving you two copies.  This makes restoring the data fairly easy to understand plus boosts read operation speed since data can be read from two locations.

Raid 0 means striping and nothing else.  Why is this good?  Striping gives you improved performance since you can both read and write faster when are writing and reading from two locations at the same time.  Think of a person going to fetch water bucks.  Carry two buckets to save one trip to the well.  With Raid 0, since there is no redundancy if any disk fails all the data is lost.  Audio and video streaming are good application for Read 0 because the data needs to read rapidly; it is not transactional data like a banking application so there is not much writing.  In the case of a failure, you can recover the video or audio from another source (i.e. a backup or upload it again) and nothing is lost.

A step up from Raid 0 is Raid 1.  This means there is a mirror but no parity or striping.  The boost in performance comes from the read operation since data can be found in one of two disks. If either drives fail the disk array still works.  So it is fault tolerant, which is good for databases.

Raid 2, Raid 3, and Raid 4 correspond to bit, byte, and then block level striping. Parity data is stored on a separate drive.  In the case of a failure, data can be recovered from remaining intact blocks and calculating the rest from parity data.  So what happens if the parity data is lost?  Nothing, since it can be calculated from the actual data.  The system can run without it and then recalculate it on-the-fly.  Losing parity data is not the same as losing, say, the customer’s checking account data.

Raid 5 disk arrays work when one drive goes bad, but only one.  This is because the parity is stored with the data.  Raid 6 gives you fault tolerance when even two disks go bad because the disk makes two copies of each parity bit.  It will continue working while you replace the drive.  Needless to say, these disk configurations are for mission-critical applications.

Now you should have a basic understanding of what raid means when the discussion comes up. But let’s hope it doesn’t come up in the scenario described above.