Tuesday, January 12, 2016

Solaris 11: Hands-on Lab – Replacing Failed Disks in ZFS Pools (Simple/Mirrored/RaidZ)

Quick Recap about ZFS Pools
  1. Simple and Striped Pool (Equivalent to Raid-0  and Data is Non redundant)
  2. Mirrored Pool (Equivalent to Raid-1)
  3. Raidz pool (Equivalent to Single Parity Raid 5 – Can with stand up to single disk failure)
  4. Raidz-2 pool (Equivalent to Dual Parity Raid 5 – Can withstand up to two disk failures)
  5. Raidz-3 pool (Equivalent to Triple Parity Raid 5 – Can with stand up to three disk Failures)
RAIDZ Configuration Requirements and Recommendations

A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
  • Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
  • Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
  • Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
  • (N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
  • The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups
General consideration: your goal is maximum disk space or maximum performance?
  • A RAIDZ configuration maximizes disk space and generally performs well when data is written and read in large chunks (128K or more).
  • A RAIDZ-2 configuration offers better data availability, and performs similarly to RAIDZ. RAIDZ-2 has significantly better mean time to data loss (MTTDL) than either RAIDZ or 2-way mirrors.
  • A RAIDZ-3 configuration maximizes disk space and offers excellent availability because it can withstand 3 disk failures.
  • A mirrored configuration consumes more disk space but generally performs better with small random reads. 
Disk Failure Scenario for Simple/Striped ZFS Non Redundant Pool 
Disk Configuration:
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       3. c3t5d0
          /pci@0,0/pci8086,2829@d/disk@5,0
       4. c3t6d0
          /pci@0,0/pci8086,2829@d/disk@6,0

Creating Simple ZFS Storage Pool
root@solarisbox:/dev/chassis# zpool create poolnr c3t2d0 c3t3d0
'poolnr' successfully created, but with no redundancy; failure of one device will cause loss of the pool
root@solarisbox:/dev/chassis# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
poolnr  3.97G  92.5K  3.97G   0%  1.00x  ONLINE  -
rpool   63.5G  5.21G  58.3G   8%  1.00x  ONLINE  -

Creating Sample Filesystem for new pool
root@solarisbox:/dev/chassis# zfs create poolnr/testfs

root@solarisbox:/downloads# zpool status poolnr
  pool: poolnr
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        poolnr    ONLINE       0     0     0
          c3t2d0  ONLINE       0     0     0
          c3t3d0  ONLINE       0     0     0

errors: No known data errors


After Manual Simulation of the Disk (c3t2d0) failure:
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       3. c3t5d0
          /pci@0,0/pci8086,2829@d/disk@5,0
       4. c3t6d0
          /pci@0,0/pci8086,2829@d/disk@6,0

root@solarisbox:~# zpool status poolnr
pool: poolnr
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors.  There are insufficient replicas for the pool to
        continue functioning.
action: Destroy and re-create the pool from a backup source.  Manually marking the device repaired using 'zpool clear' may allow some data to be recovered.
  scan: none requested
config:
        NAME      STATE     READ WRITE CKSUM
        poolnr    UNAVAIL      0     0     0  insufficient replicas
          c3t2d0  FAULTED      1     0     0  too many errors
          c3t6d0  ONLINE       0     0     0


From The above Scenario it has been observed that Simple ZFS pool cannot withstand for any disk failures.


Disk Failure Scenario for Mirror Pool  

Initial Disk Configuration
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0               
   4. c3t7d0
      /pci@0,0/pci8086,2829@d/disk@7,0               
Specify disk (enter its number): Specify disk (enter its number):


Create Mirror Pool
root@solarisbox:~# zpool create mpool mirror c3t4d0 c3t7d0

root@solarisbox:~# zfs create mpool/mtestfs

               >>> Copy Some Sample data to new file system

root@solarisbox:~# df -h|grep  /mpool/mtestfs
mpool                  2.0G    32K       2.0G     1%    /mpool
mpool/mtestfs          2.0G    31K       2.0G     1%    /mpool/mtestfs

 
root@solarisbox:~# zpool status mpool
  pool: mpool
 state: ONLINE
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        mpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
errors: No known data errors

After Manually simulating the Disk Failure
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0               
Specify disk (enter its number): Specify disk (enter its number):
               <== we lost the disk c3t7d0
           

Checking pool Status  after Disk Failure
root@solarisbox:~# zpool status mpool
  pool: mpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mpool       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  UNAVAIL      0     0     0  cannot open

errors: No known data errors

After physically Replacing the Failed disk (placing new disk in same location)
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0  << New Disk
>>> Label new disk with SMI Label ( A requirement to attach to ZFS pool)

root@solarisbox:~# format -L vtoc -d c3t7d0
Searching for disks…done
selecting c3t7d0
[disk formatted]
c3t7d0 is labeled with VTOC successfully.


Replace the Failed Disk Component from the ZFS pool 
root@solarisbox:~# zpool replace  mpool c3t7d0

root@solarisbox:~# zpool status -x mpool
pool 'mpool' is healthy

root@solarisbox:~# zpool status  mpool
  pool: mpool
 state: ONLINE
  scan: resilvered 210M in 0h0m with 0 errors on Sun Sep 16 10:41:21 2012
config:

        NAME        STATE     READ WRITE CKSUM
        mpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0    <<< Disk Online

errors: No known data errors

root@solarisbox:~#


Single and Double Disk Failure Scenarios for ZFS Raid-Z Pool 

Disk Configuration Available for new Raid-Z pool Creation
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):

Creating New RaidZ Pool 
root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t2d0s0 is part of exported or potentially active ZFS pool poolnr. Please see zpool(1M).
        ==> Here we had an issue with one of the disk we selected for the pool, and the reason is the disk already used by some zpool earlier. But now the old zpool no longer available, and we want to reuse the disk for the new zpool.
        ==> We can solve the problem by two ways  
1. Use -f option to override the configuration
2. Reinitialize the partition table for the disk (Solaris X86 only).
        ==> In this example I have reinitialized the whole disk as solaris partition with below command

root@solarisbox:~# fdisk -B /dev/rdsk/c3t3d0p0

root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0

root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

errors: No known data errors


Create File system and Copy some test data to rzpool/r5testfs
root@solarisbox:~# zfs create rzpool/r5testfs
root@solarisbox:/downloads# df -h|grep test
rzpool/r5testfs        5.8G   575M       5.3G    10%    /rzpool/r5testfs

root@solarisbox:/downloads# cd /rzpool/r5testfs/

root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r–   1 root     root     602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip

root@solarisbox:/rzpool/r5testfs#


After Manual Simulation of the Disk failure ( i.e. c3t7d0) 
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0             <<== c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):


Checking the zpool Status – it is in Degraded State 
root@solarisbox:~# zpool status -x rzpool
  pool: rzpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  UNAVAIL      0     0     0  cannot open
errors: No known data errors


Checking if the File system is Still Accessible
root@solarisbox:~# df -h |grep testfs
rzpool/r5testfs        5.8G   575M       5.3G    10%    /rzpool/r5testfs
root@solarisbox:~# cd /rzpool/r5testfs
root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r–   1 root     root     602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip

root@solarisbox:/rzpool/r5testfs#


After replacing the failed disk with new disk, in the same location
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
root@solarisbox:~# zpool status -x
  pool: rzpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  FAULTED      0     0     0  corrupted data     <<== this State Changed to Faulted just because the zpool could see the new disk but with no/corrupted data
errors: No known data errors


Replacing the Failed Disk Component in the Zpool
root@solarisbox:~# zpool replace rzpool c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t7d0s0 is part of exported or potentially active ZFS pool mpool. Please see zpool(1M).

root@solarisbox:~# zpool replace -f rzpool c3t7d0   <<== using -f option to override above message

root@solarisbox:~# zpool status -x
all pools are healthy

root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
errors: No known data errors


Two Disk Failures Scenario for  RaidZ pool  – And it Fails   

Zpool Status Before Disk Failure
root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

Disk Configuration After Simulating double disk failure
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0   <== C3t4d0 & c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):

Zpool Status after the Double Disk Failure 
root@solarisbox:~# zpool status -x
  pool: rzpool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      UNAVAIL      0     0     0  insufficient replicas
          raidz1-0  UNAVAIL      0     0     0  insufficient replicas
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  UNAVAIL      0     0     0  cannot open
            c3t7d0  UNAVAIL      0     0     0  cannot open

 Conclusion:  /rzpool/r5testfs filesystem not available for usage and the Zpool cannot be recovered from the current status

No comments:

Post a Comment