819-0051-10

22
Sun Microsystems, Inc. www.sun.com Disk Scrubbing Sun StorEdge Arrays Guruprasad Sakaleshpura  Part No. 819-0051-10 Version: 2.0  Date: 29 June 04 Thanks to: Tejinder Singh Craig Rolandelli  Disk Scrubbing Sun StorEdge Arrays 1

Upload: wayne-huang

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 1/22

Sun Microsystems, Inc.www.sun.com

Disk ScrubbingSun StorEdge Arrays

Guruprasad Sakaleshpura

 Part No. 819-0051-10

Version: 2.0

 Date: 29 June 04 

Thanks to:

Tejinder Singh

Craig Rolandelli 

 Disk Scrubbing Sun StorEdge Arrays 1

Page 2: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 2/22

Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.

Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, andwithout limitation, these intellectual property rights may include one or more of the U.S. patents listed athttp://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in othercountries.

This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, anddecompilation. No part of the product or of this document may be reproduced in any form by any means without prior written

authorization of Sun and its licensors, if any.Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.

Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registeredtrademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.

Sun, Sun Microsystems, the Sun logo, AnswerBook2, docs.sun.com, StorEdge, and Solaris are trademarks or registered trademarksof Sun Microsystems, Inc. in the U.S. and in other countries.

All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S.and in other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.

The OPEN LOOK and Sun? Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sunacknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces forthe computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license alsocovers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements.

U.S. Government Rights-Commercial use. Government users are subject to the Sun Microsystems, Inc. standard license agreement

and applicable provisions of the FAR and its supplements.DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS ANDWARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSEOR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BELEGALLY INVALID.

Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, Californie 95054, Etats-Unis. Tous droits réservés.

Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie qui est décrit dans ce document. Enparticulier, et sans la limitation, ces droits de propriété intellectuels peuvent inclure un ou plus des brevets américains énumérés àhttp://www.sun.com/patents et un ou les brevets plus supplémentaires ou les applications de brevet en attente dans les Etats-Unis et dans les autres pays.

Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l'utilisation, la copie, ladistribution, et la décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, parquelque moyen que ce soit, sans l'autorisation préalable et écrite de Sun et de ses bailleurs de licence, s'il y en a.

Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et

licencié par des fournisseurs de Sun.

Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l'Université de Californie. UNIX est unemarque déposée aux Etats-Unis et dans d'autres pays et licenciée exclusivement par X/Open Company, Ltd.

Sun, Sun Microsystems, le logo Sun, AnswerBook2, docs.sun.com, StorEdge, et Solaris sont des marques de fabrique ou desmarques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays.

Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARCInternational, Inc. aux Etats-Unis et dans d'autres pays. Les produits portant les marques SPARC sont basés sur une architecturedéveloppée par Sun Microsystems, Inc.

L'interface d'utilisation graphique OPEN LOOK et Sun? a été développée par Sun Microsystems, Inc. pour ses utilisateurs etlicenciés. Sun reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfacesd'utilisation visuelle ou graphique pour l'industrie de l'informatique. Sun détient une license non exclusive de Xerox sur l'interfaced'utilisation graphique Xerox, cette licence couvrant également les licenciées de Sun qui mettent en place l'interface d 'utilisationgraphique OPEN LOOK et qui en outre se conforment aux licences écrites de Sun.

LA DOCUMENTATION EST FOURNIE "EN L'ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIESEXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, YCOMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNEUTILISATION PARTICULIERE OU A L'ABSENCE DE CONTREFAÇON.

 Disk Scrubbing Sun StorEdge Arrays 2

Page 3: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 3/22

Executive SummaryThe Sun StorEdge Disk Scrubber checks the integrity of an array's media. For the storage poolsconfigured on these arrays, it also checks the RAID parity data. The disk scrubber continuously scansdisk drives for media errors. When an error is detected, it reports the problem and attempts to fix it.

The following Sun StorEdge arrays include the disk scrubbing feature. The array’s firmware versiondetermines how the disk scrubber is run.

• T3+ and 3910/3960/6120/6320/6910/6920/6960 arrays with firmware 3.1.x the disk scrubberruns continuously while the system is running

• 6120/6320 arrays with firmware 3.0.x the disk scrubber can be run manually or continuously viaStorADE

• T3+ and 3910/3960/6910/6960 arrays with firmware 2.1.3, 2.1.4, 2.1.5 or 2.1.6 the disk scrubbercan be run manually or continuously via StorADE

• T3 array with firmware 1.18.x the disk scrubber can be run manually or continuously via StorADE

Sun’s disk scrubber behaves differently on different versions of array firmware. Appendix A details

those differences.This paper provides an overview of disk scrubbing in general as well as a description of Sun’simplementation of disk scrubbing. In addition, this paper describes various types of disk drivefailures and shows how the disk scrubber can reduce the likelihood of such failures. Finally, forthose readers who need a quick refresh, Appendix B provides high-level overview of RAID.

Disk Scrubbing OverviewDisk scrubbing is performed on virtual volumes or “storage pools.” For each storage pool, the diskscrubber reads each data stripe and compares data blocks to mirror/parity blocks as appropriate foreach RAID level.

Disk Failure TypesFor the purposes of this white paper, we classify disk failures into two categories:• Failures that cause the loss of a single block of data within a data stripe• Failures that cause the loss of the entire disk drive.

Single Block LossOne of the more common single block failures in disk drives is a media read error. These are readfailure conditions returned from a disk when a read request is made to a logical block on a disk andthe disk cannot return the data due to a media error. The signature for this error is “Sense key 0x03”and “ASC 0x11.” 1 These errors can occur for a variety of reasons including physical damage to disk,contamination of media or age of the disk drive.

When the disk scrubber on a Sun StorEdge array encounters a media read error, if a redundant diskdrive is available, the disk scrubber directs the disk drive to reallocate data to a good block. Doing so

1 For additional information, see the white paper SCSI Sense Key Error Guide (Part No: 817-5918-

10) that describes disk drive failure codes, sense key information and the recommended action.

 Disk Scrubbing Sun StorEdge Arrays 3

Page 4: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 4/22

removes the media read error condition from the drive.

Drive LossA RAID system is designed to recover from a read failure condition provided it can recover data froma redundant disk drive. In the event of a complete disk failure condition, the availability of RAID

system is reduced because a full column is not available. In such cases, the RAID system tries toreconstruct the data from redundant drives and creates a new column to a standby disk. If the systemencounters a media read error during the reconstruction phase, the data associated with the affectedblock is not available. This is referred to as “double disk drive” failure condition.

 Disk Scrubbing Sun StorEdge Arrays 4

Page 5: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 5/22

Disk Scrubber Error HandlingRunning the disk scrubber continuously will reduce the occurrence of double disk drive” failures andincreases the reliability and availability of the RAID system.

RAID 0 OperationOn RAID 0 storage pools, the disk scrubber reads the data blocks in each data stripe from thebeginning to the end of the storage pool. If a disk media read error is encountered it reports the errorto the system log and the volume is left in a mounted state and is accessible to the user.

RAID 1 Operation

On RAID 1 storage pools, the disk scrubber reads data block and mirror block in each stripe and thedata block is compared with the corresponding mirror block contents.

Single Disk Media Read ErrorIf a disk scrubber encounters a media read error on a mirror block/data block, it logs the error andreads the data from the corresponding data block/mirror block respectively and sends a request towrite back to the original mirror block/data block. The disk scrubber then reallocates the data to anew block.

Example:

If the disk scrubber encounters a media error in Data Blk1, it reads Mirror Blk1 and writes back toData Blk1.

 Disk Scrubbing Sun StorEdge Arrays 5

Page 6: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 6/22

Drive Failure and a Single Disk Media Read ErrorIf a disk scrubber encounters a media read error on a data block, and also finds the correspondingmirror block disk drive removed or disabled, then the data corresponding to the affected block ispermanently lost. This is a double disk drive failure condition. In this case the disk scrubber reportsthe error, stops scrubbing that volume, and keeps the volume in mounted state and accessible to host.

Example:

If the disk scrubber encounters a media read error on Data Blk1 and Drive2 is disabled or removed,then it is a double disk drive failure condition. The data associated with Data Blk1 is lost since theMirror Blk1 is not available.

 Disk Scrubbing Sun StorEdge Arrays 6

Page 7: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 7/22

Multiple Disk Media Read ErrorsIf the disk scrubber encounters disk read errors on both the data block and mirror block within thesame Data/Mirror stripe, the data is lost. This is a double disk drive failure condition. In this case, thedisk scrubber will report the error and terminate scrubbing. The volume will be in the mounted stateand is accessible to user. This is a very rare condition.

If the disk scrubber encounters more than one disk media read errors on a data/mirror stripe pair, itcan still repair the block, if the corresponding mirror or data drive can be read.

Example:

If the disk scrubber encounters a media read error on Data Blk1 and corresponding Mirror Blk1thisindicates a double disk drive failure condition. If Data Blk1 , Data Blk2 and Mirror Blk3 have mediaread errors, the data is still available from Mirror Blk1, Mirror Blk2 and Data Blk3.

 Disk Scrubbing Sun StorEdge Arrays 7

Page 8: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 8/22

Mismatch of Data Block and Mirror Block

The possibility of the disk scrubber finding differences between the contents of data block andcorresponding mirror block is very low. In such cases, the system always trusts the data blockvalidity and modifies the mirror block to match the data block. This case is logged to the syslog.

Example:

In this example, the disk scrubber reads Data Blk1 and compares it with Mirror Blk1. If they aredifferent, the disk scrubber over writes Mirror Blk1 with Data Blk1.

 Disk Scrubbing Sun StorEdge Arrays 8

Page 9: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 9/22

RAID 5 OperationOn RAID 5 storage pools, the disk scrubber reads data blocks in each data stripe and computes theparity for that data Stripe. It then compares the computed parity with the stored parity within thedata stripe.

Single Disk Media Read ErrorIf the disk scrubber encounters a media read error on a data block in a data stripe, it logs the error inthe syslog and reads the corresponding parity block. The disk scrubber then re-computes the datafrom other data blocks and the parity block and sends the request to write the new computed datablock to the affected data block, causing the disk to reallocate the block.

Conversely, if the disk scrubber encounters a media read error on the parity block, it overwrites theparity block with a new parity block computed from the data blocks in that stripe.

Example:

If Data Blk1 has a media read error, then the data is regenerated from Data Blk2 and Parity Blk1 andthe computed data block is written back to Data Blk 1.

 Disk Scrubbing Sun StorEdge Arrays 9

Page 10: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 10/22

Drive Failure and Single Disk Media Read Error

If the disk scrubber encounters a media read error on a data block and also finds one of thecorresponding stripe disk drives removed or disabled, the data corresponding to the affected block ispermanently lost. This is a double disk drive failure condition. In this case a disk scrubber reports theerror, stops scrubbing that volume, and still keeps the volume in mounted state and accessible to

host.

Example:

If the disk scrubber encounters a media read error on Data Blk1 and finds Drive2 or Drive3 eitherremoved or disabled, then it is a double disk drive failure condition. It is not possible to re-computefrom Parity Blk1 without Data Blk2.

 Disk Scrubbing Sun StorEdge Arrays 10

Page 11: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 11/22

Multiple Disk Media Read Errors

If the disk scrubber encounters two or more disk read errors on both the data block and parity blockwithin the same data stripe, then the data corresponding to the affected block is permanently lost.This is a double disk drive failure condition. In this case, the disk scrubber will report the error andterminate the scrubbing action. The volume will be left in a mounted state and the contents are

accessible to user.Example:

If the disk scrubber encounters a media read error on Data Blk1 and Parity Blk1, then it is a doubledisk drive failure condition. It is not possible to re-compute from Data Blk1 without Parity Blk1.

If the disk scrubber encounters a media read error on Data Blk1 and Data Blk2 then it is a double diskdrive failure condition. It is not possible to re-compute from Parity Blk1 without Data Blk2.

 Disk Scrubbing Sun StorEdge Arrays 11

Page 12: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 12/22

Mismatch of Data Blocks and Parity Block

The probability of the contents of the computed parity block and corresponding parity block beingdifferent is very low. However, in such cases, the system will trust the data block’s validity and re-compute the computed parity block and write it out to the Parity Block. The event is logged in syslog.

Example:

In this example, the disk scrubber reads Data Blk1 , Data Blk2, computes Computed Parity Blk1 andcompares with Parity Blk1. If it finds them different, it overwrites Parity Blk1 with Computed ParityBlk1.

Sun StorEdge Disk Scrubber OverviewSun’s storage arrays store data on disks in one of three formats:

• RAID 0• RAID 1• RAID 5

RAID systems work on small chunks of data referred to as blocks. Sun StorEdge arrays use blocksizes of: 4K, 8K, 16K, 32K or 64K . Blocks are then organized into data stripes.

Note: Not all block sizes are supported in all releases.

Within each RAID level, data stripes are managed differently to provide different levels ofavailability and reliability. Single block drive failures within each stripe will affect the systemdifferently based on each RAID level.

The Sun StorEdge Disk Scrubber checks the integrity of an array's media. For RAID devices, it also

checks parity data. When the disk scrubber detects an error, it reports the problem and attempts to fixit.

Array firmware 3.1.x automatically runs the disk scrubber in background continuously. To haveminimal impact on host IO, it runs the disk scrubber Vol Verify at the lowest setting.

 Disk Scrubbing Sun StorEdge Arrays 12

Page 13: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 13/22

Volume Verify (Vol Verify)Sun’s StorEdge storage arrays use the Vol Verify functionality of the array firmware to implementdisk scrubbing. Vol Verify runs with the -fix option turned ON to enable the disk scrubber to repairthe system as it finds problems. On firmware versions 3.1.3 or later, the disk scrubber starts a newinstance of Vol Verify as soon as the previous instance finishes running. On older versions of the

array firmware, Vol Verify must be run manually or continously via StorADE.

Vol Verify OperationVol Verify generates I/Os to disks which compete with host initiated I/Os. The Rate option to VolVerify tells the system what priority to place on Vol Verify generated I/Os versus Host generatedI/Os.

On firmware versions prior to 3.1.3, you must set the rate option. The rate option can be set from1 to 8.

• Rate 1: Least impact to Host I/Os. The system will favor host I/Os over Vol Verify I/Os. Theeffect on Vol Verify is that it will take longer to complete.

If few or no host I/Os are queued up on the system, Vol Verify running at Rate 1 will increase thenumber of I/Os it generates and back off when more host I/Os come in. This allows Vol Verify tocomplete as fast as possible without affecting host I/Os.

• Rate 8: Greatest impact to host I/Os. The system will treat Vol Verify and host generated I/Osequally. The effect on Vol Verify is that it complete sooner.

The time for Vol Verify to complete depends on the rate, the size of the storage pool and the numberof host I/Os. It could take from hours to days to complete one pass on a large and heavily loadedsystem.

-fix OptionThe Vol Verify -fix option fixes the mirror block or parity block when there is a Data/Mirror orData/Parity mismatch.

Unlike other disk scrubbers that either fix the data block or “do nothing,” Sun’s disk scrubber fixesthe mirror or parity block. Sun’s disk scrubber uses this approach because:

• The goal of Vol Verify is to scrub the disks in the storage pool for errors and fix them. Vol Verifymust follow the same rules as a host I/O request. Vol Verify uses the same approach as a normalHost I/O request.

• When a host reads data, the data block is returned without checking the mirror block or parityblock. This is an industry standard process for returning read requests for RAID 5/RAID 1. Like anormal read operation, Vol Verify assumes the data block is correct.

• When Vol Verify finds a Data/Mirror or Data/Parity Block mismatch, it does one of two things:

1.  Just reports the error. In this case, data must be manually restored the particularVolume/LUN in that Storage pool.2. Fix the Mirror/Parity and report the error. In this case, the system fixes the storage pool

and notifies the user that there may be corrupted data in a particular Volume/LUN. If thedata is found to be bad, the user has to manually restore the data for that Volume/LUN.

 Disk Scrubbing Sun StorEdge Arrays 13

Page 14: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 14/22

Array Firmware 3.1.x Operation

Array firmware 3.1.x runs the disk scrubber in background continuously. The disk scrubber runs Vol

Verify at the lowest rate setting. Doing so, has a minimal impact on host IO.

Array Firmware 1.18.x, 2.0.x, 2.1.x, and 3.0.x OperationPre 3.1 array firmware uses StorADE to set up and run the disk scrubber.

Starting the Disk Scrubber Via StorADETo automatically run the pre 3.1 disk scrubber via StorADE:1. Log on to the StorADE web interface2. Add the array devices to be monitored3. Select the Manage Tab

4. Select the Vol Verify to run continuously5. Select Activate vol-verify

6. Select Run with fix option.Select “Continuously” from the pull down menu to run on the array at most every X days. WhereX is the maximum number of days the disk scrubber will run. Sun recommends using the“Continuously” option.

Note: you can also select the disk scrubber to run once from within a range of 1 to 90 days.

7. Select Execution Rate from slow, medium and fast depending on the speed and performanceimpact tolerated for your configuration. Sun recommends using the “Slow” option.

Upgrading from 2.1.x or 3.0.x to 3.1.xThe major difference in 3.1.x release is that the disk scrubber runs by default.

Before upgrading, run your current version of disk scrubber to ensure that volumes and disks are inoptimal running condition. This ensures that the new version of disk scrubber will encounter fewererrors.

 Disk Scrubbing Sun StorEdge Arrays 14

Page 15: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 15/22

Example LogsThe system log details the type of error encountered and what corrective action, if any, was taken.The following examples detail system responses and actions when a media read error is encounteredon a disk drive at different RAID levels of storage pool.

RAID 1 volumes also show similar logs except the parity block will be referred to as mirror block.Note: System logs differ between firmware releases. The actual log you see in your systemmay be slightly different.

Single Disk Media Read ErrorThe following system log shows a media read error condition and the action taken. In this example,disk u1d7 reported a media read error at 0x8B290 logical block that belongs to stripe 1242. This isseen by the message “Unrecovered Read Error” in the system log. Upon encountering this condition,the disk scrubber re-computes the data for the block having disk LBA 0x8B290 from other blocks inthis stripe and writing it back to the disk drive u1d7. This caused the reallocation of LBA to a goodregion on disk drive. The recovery is confirmed by the reallocation response from the drive and the

“Write Error - Recovered With Auto Reallocation” message from the drive.The message “u1ctr fixing parity on verify scb=471067c” is to be ignored because parity errors cannotbe fixed. However, data belonging to 0x8B290 was re-computed and the error condition wasremoved from the disk drive. This message will be removed in future releases of array firmware.

Mar 30 16:20:29 sh01[1]: N: Vol verify (v1) started

Mar 30 16:20:44 ISR1[1]: W: u1d07 SCSI Disk Error Occurred (path = 0x0)

Mar 30 16:20:44 ISR1[1]: W: u1d07 Sense Key = 0x3, Asc = 0x11, Ascq = 0x0

Mar 30 16:20:44 ISR1[1]: W: u1d07 Sense Data Description = Unrecovered Read Error

Mar 30 16:20:44 ISR1[1]: W: u1d07 Valid Information = 0x8b290

Mar 30 16:20:44 ISR1[1]: N: u1d07 SVD_DONE: Command Error = 0x3

Mar 30 16:20:44 ISR1[1]: N: u1d07 sid 1242 stype 4003 disk error 3

Mar 30 16:20:44 SX11[1]: N: u1ctr fixing parity on verify scb=471067c

Mar 30 16:20:45 ISR1[1]: N: u1d07 SCSI Disk Error Occurred (path = 0x0)

Mar 30 16:20:45 ISR1[1]: N: u1d07 Sense Key = 0x1, Asc = 0xc, Ascq = 0x1

Mar 30 16:20:45 ISR1[1]: N: u1d07 Sense Data Description = Write Error - Recovered With Auto Reallocation

Mar 30 16:20:45 ISR1[1]: N: u1d07 Valid Information = 0x8b290

Mar 30 16:20:45 WXFT[1]: N: u1ctr Parity on stripe 4DA is fixed in vol (v1)

Mar 30 16:20:51 sh01[1]: N: Volume v1 verification ended.

 Disk Scrubbing Sun StorEdge Arrays 15

Page 16: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 16/22

Multiple Disk Media Read Errors

Following system log shows the double disk drive failure scenario. In this example, there was amedia read error on logical block 0x975E0 on disk u1d9 and u1d8 in the same volume that belongs toeither two data blocks of same stripe or data block and Parity Block of the same stripe.

Mar 30 16:20:29 sh01[1]: N: Vol verify (v1) started

Mar 30 16:20:50 ISR1[1]: W: u1d09 SCSI Disk Error Occurred (path = 0x1)

Mar 30 16:20:50 ISR1[1]: W: u1d09 Sense Key = 0x3, Asc = 0x11, Ascq = 0x0

Mar 30 16:20:50 ISR1[1]: W: u1d09 Sense Data Description = Unrecovered Read Error

Mar 30 16:20:50 ISR1[1]: W: u1d09 Valid Information = 0x975e0

Mar 30 16:20:50 ISR1[1]: N: u1d09 SVD_DONE: Command Error = 0x3

Mar 30 16:20:50 ISR1[1]: N: u1d09 sid 1632 stype 4003 disk error 3

Mar 30 16:20:50 ISR1[1]: W: u1d08 SCSI Disk Error Occurred (path = 0x0)

Mar 30 16:20:50 ISR1[1]: W: u1d08 Sense Key = 0x3, Asc = 0x11, Ascq = 0x0

Mar 30 16:20:50 ISR1[1]: W: u1d08 Sense Data Description = Unrecovered Read Error

Mar 30 16:20:50 ISR1[1]: W: u1d08 Valid Information = 0x975e0

Mar 30 16:20:50 ISR1[1]: N: u1d08 SVD_DONE: Command Error = 0x3

Mar 30 16:20:50 ISR1[1]: N: u1d08 sid 1632 stype 4003 disk error 3

Mar 30 16:20:50 SX11[1]: N: u1ctr multiple read failure on verify scb=4870064

Mar 30 16:20:50 SX11[1]: N: u1ctr Internal Command error (Terminated by system)

Mar 30 16:20:50 LNXT[1]: N: u1ctr verify failed in vol (v1)

Mar 30 16:20:51 sh01[1]: N: Vol verify (v1) ended

Mismatch of Data Block and Parity Block

The following example is of a system log where the disk scrubber encountered a mismatch betweenthe original written contents of the parity block and computed parity information from data blocks.

Mar 30 15:36:57 sh02[1]: N: Vol verify (v1) started

Mar 30 15:36:57 WXFT[1]: E: u1ctr: vol (v1), Slice Name:(vslun1) vol verify detected data parity mismatch on Stripe: 7, Lun:3

Mar 30 15:36:57 WXFT[1]: N: u1ctr Parity on stripe 7 is fixed in vol (v1)

Mar 30 15:36:57 WXFT[1]: E: u1ctr: vol (v1), Slice Name:(vslun1) vol verify detected data parity mismatch on Stripe: 8, Lun:3

Mar 30 15:36:57 WXFT[1]: N: u1ctr Parity on stripe 8 is fixed in vol (v1)

Mar 30 15:51:19 sh02[1]: N: Vol verify (v1) ended

 Disk Scrubbing Sun StorEdge Arrays 16

Page 17: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 17/22

Appendix AThe following table details how vol verify is implemented across different firmware releases.

Firmware

Release

Vol Verify Fix Vol Verify

1.16.x Fixes parity by generating a new parityfrom data when a data/parity mismatchcondition is encountered. Also reports thestripe ID of the mismatched stripe.

 Media error condition:If a read error is encountered, fixesoperation hang. This was a bug and is fixedin future releases.

Scans the volume and reports thestripe ID where the data/paritymismatch condition wasencountered.

 Media error condition:If a read error is encountered, volverify operation terminates.

1.18.0,2.0.1, and2.1.0ver.

Fixes parity by generating a new parityfrom data when a data/parity mismatchcondition is encountered. Also reports thestripe ID of the mismatched stripe.

 Media error condition:If a read error is encountered, vol verifyoperation terminates.

Scans the volume and report thestripe ID where the data/paritymismatch condition wasencountered.

 Media error condition:If a read error is encountered, volverify operation terminates.

2.1.1 Fixes parity by generating a new parityfrom data when a data/parity mismatchcondition is encountered. Also reports thestripe ID of the mismatched stripe.

 Media error condition:If a read error is encountered, vol verifyoperation terminates and the drivereporting this error will be disabled.

Scans the volume and reports thestripe ID where the data/paritymismatch condition wasencountered.

 Media error condition:If a read error is encountered, volverify operation terminates andthe drive reporting this error will

be disabled.1.18.1 andbefore2.1.3 andafter3.0 ver

Fixes parity by generating a new parityfrom data when a data/parity mismatchcondition is encountered. Also reports thestripe ID of the mismatched stripe.

 Media error condition:If a read error (3/11) isencountered, data OR parity will begenerated from remaining drives andwritten back to the drive and locationreporting unrecovered read error. Afterthis, it scans through the whole volume.

If a disabled drive is present, the operationwill terminate.

Scans the volume and reports thestripe ID where the data/paritymismatch condition wasencountered.

 Media error condition:If a read error (3/11) is

encountered, the drive, stripe IDand LBA for which theunrecovered read error wasreported will be logged in syslogand vol verify operation

continues to scan the rest ofvolume.

If a disabled drive is present, theoperation will terminate.

 Disk Scrubbing Sun StorEdge Arrays 17

Page 18: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 18/22

3.1.0 verand after

Fixes parity by generating a new parityfrom data when a data/parity mismatchcondition is encountered. Also reports thestripe ID of the mismatched stripe.

 Media error condition:If a read error (3/11) isencountered, data OR parity will begenerated from remaining drives andwritten back to the drive and locationreporting unrecovered read error. Afterthis, it scans through the whole volume. Ifa disabled drive is present, the operationwill terminate.

Disk Scrubber:By default an automatic backgroundoperation of vol verify with fix option willbe running at a very low rate(priority) level.This will be running continuously one after

other on all volumes in the system.

Scans the volume and reports thestripe ID where the data/paritymismatch condition wasencountered.

 Media error condition:If a read error (3/11) isencountered, the drive, stripe IDand LBA for which theunrecovered read error wasreported will be logged in syslogand continues to scan the rest ofvolume.

If a disabled drive is present, theoperation will terminate.

 Disk Scrubbing Sun StorEdge Arrays 18

Page 19: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 19/22

Appendix B

RAID Overview

RAID 0RAID 0 breaks data from the host into small chunks known as data blocks and stores them on disk.RAID 0 provides no protection against disk failures as there is no mirror or parity bits stored with thedata. The data blocks are stored in stripes. The following graphic details how the data is physicallyorganized on the disk drives:

Any data stored in a disk block associated with a media error is not recoverable; the data storedwithin the rest of the data stripe and the rest of the disks is available. A write request from the host tothe affected block will cause the drive to reallocate the block elsewhere on the disk.

 Disk Scrubbing Sun StorEdge Arrays 19

Page 20: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 20/22

RAID 1

RAID 1 breaks data into small chunks and stores them as data blocks. It also stores a copy called amirror block on another physical disk. Reads are done from the data blocks. The following graphicdetails how the data is physically organized on disk drives:

If a media read error is encountered during a data block read operation, the RAID system reads thedata from the mirror block and if successful, it writes the data back to the affected data block. Whenthe RAID system sends a write request, the disk drive reallocates the affected block to a differentblock of the disk drive.

If a media read error is encountered on both the data block and corresponding mirror block, the

RAID system cannot recover. Data is lost.

 Disk Scrubbing Sun StorEdge Arrays 20

Page 21: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 21/22

RAID 5

RAID 5 breaks data into data blocks. At the same time, it computes parity from all the data blocks ofthe same stripe. This is known as the parity block. The parity block is stored on a different disk drive.The parity block is the XOR of the contents of all data blocks in a particular data stripe. There is oneparity block for every data stripe in the volume. The following graphic details how the data is

physically organized on disk drives:

If a media read error is encountered during a stripe read, the RAID system reconstructs the data fromthe data blocks and parity block. The data is then written back to the failed disk block. As with RAID0 and 1, when the system writes to the failed disk block the disk drive reallocates the data to anothersection of the disk drive.

 Disk Scrubbing Sun StorEdge Arrays 21

Page 22: 819-0051-10

8/6/2019 819-0051-10

http://slidepdf.com/reader/full/819-0051-10 22/22

GlossaryStorage pool: Also referred as virtual volume, created by using one or more disk drives in selectedRAID configuration of storage array and made available to user as a single big disk.

RAID: Redundant Array of Inexpensive Disks

RAID set: Group of disk drives in a given virtual volume under specified RAID configuration.Block size in Array: Small chunks of data, size of 4k, 8k, 16k, 32k or 64k configurable on array. This isthe minimum logical size on which RAID management activity like, reading, writing from disk andgenerating parity information is done.

 Media read error: An error condition where a drive is not able to read the information present on therequest location of drive. This could be because of physical damage to disk, defective disk etc. Thiscondition is reported as SCSI error condition with sense key: 0x03 and ASC 0x11.

 Disk Scrubbing Sun StorEdge Arrays 22