UDSS DiskArray features - Smart Disk Cloning

Legacy Smart Disk Cloning

S.M.A.R.T. Implementation

S.M.A.R.T. stands for Self-Monitoring, Analysis and Reporting Technology. It is a technology that allow disk drives to predict near-term failure. Disk drives equipped with S.M.A.R.T. will make a status report to the host if it detects degradation with predetermined drive attributes. Not all failures are predictable. S.M.A.R.T. predictability is limited to the attributes that the drive can monitor. Typically, these include:

Head Flying Height
Data Throughput Performance
Spin-Up Time
Re-Allocated Sector Count
Seek Error Rate
Seek Time Performance
Spin Try Recount
Drive Calibration Retry Count

For SCSI drives, an industry standard specification is used as defined in the ANSI-SCSI Informational Exception Control (IEC) document X3T10/94-190. Normally, SCSI drives with S.M.A.R.T. capability communicate a reliability condition as either good or failing. The specification provides for a sense bit to be flagged if a reliability issue exists. The host may then alert the user.

Legacy’s Implementation

Legacy is using ANSI-SCSI Informational Exception Control (IEC) document X3T10/94-190 standard as well. There are four settings relating to S.M.A.R.T. function in the firmware setup:
1. Disable - S.M.A.R.T. function not activated
2. Detect - S.M.A.R.T. function enabled. RAID Controller(s) will send command to enable all the drives' S.M.A.R.T. function. If a drive detects a problem, the RAID Controller(s) will send an event log.
3. Perpetual Clone - S.M.A.R.T. function enabled. RAID Controller(s) will send command to enable all drives' S.M.A.R.T. function. If a drive detects a problem, the Raid Controller(s) will send an event log. The Raid Controller(s) will clone the drive if there is a hot-spare drive. The faulty drive will not be taken off-line, and the cloned drive still continues to operate as a spare drive. In case the faulty drive stops working, the spare drive will take over immediately. If the faulty drive continues to function and another drive fails instead, the spare drive will become active data rebuild into it.
4. Clone + Replace - S.M.A.R.T. function enabled. RAID Controller(s) will send command to enable all drives' S.M.A.R.T. function. If a drive detects a problem, Raid Controller(s) will send an event log. The Raid Controller(s) will clone the drive to the spare drive and take the failed drive off-line.

Legacy Disk Cloning Benefits

All RAID 5 Subsystems can only protect against a single disk drive failure. Recovery from a failed disk is accomplished with the help of the parity information stored on the remaining disks. During a single drive failure, not only is there peformance loss from having to recalculate from parity for all read requests, the entire system is at risk of a complete failure should a second disk fail before a rebuild is completed.

The more disks there are within a RAID Subsystem, the more likelihood of a second disk failure. For our example below, it's easy to see why a hot spare disk is commonly used as another level of protection. Rather than waiting for a failed disk to be replaced and then rebuilt, simply having a Hot Spare Disk within the RAID Subsystem can reduce the risk of a second disk failure by 4 times!

1:10,000 Chance of a 2nd Disk Failure if no Hot Spare Disk is installed, assuming it takes 24 hours to replace and rebuild the failed disk.

1:40,000 Chance of a 2nd Disk Failrue if a Hot Spare Disk is installed, assuming it takes 6 hours to rebuild a 36GB failed Disk.