All about RAID rebuild!

A RAID rebuild is nothing but data reconstruction process that occurs when a hard disk drive needs to be replaced. This is usually experienced, when a disk fails unexpectedly and a RAID array copies data to a spare drive while the failed one gets replaced. Data is reassembled on the new drive using RAID algorithms and parity data.

It has to be notified over here that during the rebuild, the performance of some applications or processes may be negatively affected by latency.

RAID rebuild times are an important concern for storage administrators as higher capacity drives or multiple failed drives can result in longer rebuild times that take days.

But the problem isn’t just the amount of time that the RAID re-build takes; it’s also the impact of the rebuild process on applications and users.

Storage performance is in most cases significantly impacted during the RAID rebuild. That means the applications can grind to a halt or come close to it. With many arrays, you can choose to throttle down on the amount of resources that are allocated to the rebuild process so that regular storage performance is not hindered.

But with that strategy, the rebuild takes longer and you are in the exposed state of waiting for a longer period of time for rebuild.

So, here are ways to control raid build time

  1. First of all do not go for RAID on a blind note- Understand exactly what the rebuild times are and plan for the best implementation. The best way to do so is to evaluate a system after loading data, and then simulate a drive failure and measure the rebuild time. This will automatically give an idea on RAID rebuild time for your data environment.
  2. A virtualized storage will use higher number of drives in the volume group. As a result, a single drive failure can be recovered quickly because the rebuild IO load is distributed across many other drives. The improvement can be as much as six times and so virtualized systems can often begin a volume rebuild sooner because they deal with data at a much finer level of granularity than a traditional array. Virtualized systems also extract a much lower performance impact on the overall system as the rebuild occurs, typically less than 1.5% additional load.
  3. Mirroring- Mirroring can be a reliable option to depend on, if you are not willing to work with virtualized systems. Mirroring or RAID 1 requires more capacity than other RAID configurations, and capacity is nowadays coming dead cheap, than compared to the cost to recreate data. Therefore, instead of running all volumes as RAID 5 or RAID 6, explore putting the critical data on a mirrored volume. By doing so, the overall performance will get faster on mirrored volume and there’s no RAID recalculation needed to regain protection. If capacity is an issue, this may be an excellent time to explore data archiving to clear off that static (persistent) data, so the storage admin can improve protection of the primary data store and increase the efficiency of data backup and application performance.
  4. Dual parity RAID 6 can be an option to explore- If carefully implemented, Dual Parity RAID 6 will improve rebuild times over the short term, but as drive capacities reach 2TB or even greater, the problem will re-bounce. And moreover, even 6-8 hours of rebuild time is not acceptable these days. An additional consideration that must be taken into account is that a RAID rebuild is stressful on the remaining drives in the array and the chance of a second failure during the rebuild window increases. If that occurs, then one should be ready for risks such as additional failure and the integrity of the data one is trying to protect.
  5. Continuous Data Protection- Continuous Data Protection (CDP) allows for an active copy of critical data to be available on a separate physical array. This provides greater flexibility, as the second copy could be made to a SATA array, which would save costs while ensuring access to data in an active state. And with data in an active state it can be accessed directly without having to go through a recovery process.
  6. Banking on a reliable backup- A backup to disk not only speeds up the recovery, but also the reliability of the backup. Therefore, if nothing works, then you can bank on this last option.
  7. Remember- Nowadays, RAID arrays are becoming proactive, and are detecting in advance that a drive is about to fail, thus giving storage administrators enough time to act on the issue.

Please feel free to share any of your experiences, knowledge, and understanding on RAID and rebuild time. You can do it through the comments section below.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s