Quick Recap about ZFS Pools
- Simple and Striped Pool (Equivalent to Raid-0 and Data is Non redundant)
- Mirrored Pool (Equivalent to Raid-1)
- Raidz pool (Equivalent to Single Parity Raid 5 – Can with stand up to single disk failure)
- Raidz-2 pool (Equivalent to Dual Parity Raid 5 – Can withstand up to two disk failures)
- Raidz-3 pool (Equivalent to Triple Parity Raid 5 – Can with stand up to three disk Failures)
RAIDZ Configuration Requirements and Recommendations
A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
- Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
- Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
- Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
- (N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
- The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups
General consideration: your goal is maximum disk space or maximum performance?
- A RAIDZ configuration maximizes disk space and generally performs well when data is written and read in large chunks (128K or more).
- A RAIDZ-2 configuration offers better data availability, and performs similarly to RAIDZ. RAIDZ-2 has significantly better mean time to data loss (MTTDL) than either RAIDZ or 2-way mirrors.
- A RAIDZ-3 configuration maximizes disk space and offers excellent availability because it can withstand 3 disk failures.
- A mirrored configuration consumes more disk space but generally performs better with small random reads.
Disk Failure Scenario for Simple/Striped ZFS Non Redundant Pool
Disk Configuration:
Disk Configuration:
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
3. c3t5d0
/pci@0,0/pci8086,2829@d/disk@5,0
4. c3t6d0
/pci@0,0/pci8086,2829@d/disk@6,0
Creating Simple ZFS Storage Pool
root@solarisbox:/dev/chassis# zpool create poolnr c3t2d0 c3t3d0
'poolnr' successfully created, but with no redundancy; failure of one device will cause loss of the pool
root@solarisbox:/dev/chassis# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
poolnr 3.97G 92.5K 3.97G 0% 1.00x ONLINE -
rpool 63.5G 5.21G 58.3G 8% 1.00x ONLINE -
Creating Sample Filesystem for new pool
root@solarisbox:/dev/chassis# zfs create poolnr/testfs
root@solarisbox:/downloads# zpool status poolnr
pool: poolnr
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
poolnr ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
After Manual Simulation of the Disk (c3t2d0) failure:
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
3. c3t5d0
/pci@0,0/pci8086,2829@d/disk@5,0
4. c3t6d0
/pci@0,0/pci8086,2829@d/disk@6,0
root@solarisbox:~# zpool status poolnr
pool: poolnr
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors. There are insufficient replicas for the pool to
continue functioning.
action: Destroy and re-create the pool from a backup source. Manually marking the device repaired using 'zpool clear' may allow some data to be recovered.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
poolnr UNAVAIL 0 0 0 insufficient replicas
c3t2d0 FAULTED 1 0 0 too many errors
c3t6d0 ONLINE 0 0 0
From The above Scenario it has been observed that Simple ZFS pool cannot withstand for any disk failures.
Disk Failure Scenario for Mirror Pool
Initial Disk Configuration
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
Create Mirror Pool
root@solarisbox:~# zpool create mpool mirror c3t4d0 c3t7d0
root@solarisbox:~# zfs create mpool/mtestfs
>>> Copy Some Sample data to new file system
root@solarisbox:~# df -h|grep /mpool/mtestfs
mpool 2.0G 32K 2.0G 1% /mpool
mpool/mtestfs 2.0G 31K 2.0G 1% /mpool/mtestfs
root@solarisbox:~# zpool status mpool
pool: mpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
After Manually simulating the Disk Failure
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
Specify disk (enter its number): Specify disk (enter its number):
<== we lost the disk c3t7d0
Checking pool Status after Disk Failure
root@solarisbox:~# zpool status mpool
pool: mpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 UNAVAIL 0 0 0 cannot open
errors: No known data errors
After physically Replacing the Failed disk (placing new disk in same location)
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0
/pci@0,0/pci8086,2829@d/disk@7,0 << New Disk
>>> Label new disk with SMI Label ( A requirement to attach to ZFS pool)
root@solarisbox:~# format -L vtoc -d c3t7d0
Searching for disks…done
selecting c3t7d0
[disk formatted]
c3t7d0 is labeled with VTOC successfully.
Replace the Failed Disk Component from the ZFS pool
root@solarisbox:~# zpool replace mpool c3t7d0
root@solarisbox:~# zpool status -x mpool
pool 'mpool' is healthy
root@solarisbox:~# zpool status mpool
pool: mpool
state: ONLINE
scan: resilvered 210M in 0h0m with 0 errors on Sun Sep 16 10:41:21 2012
config:
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0 <<< Disk Online
errors: No known data errors
root@solarisbox:~#
Single and Double Disk Failure Scenarios for ZFS Raid-Z Pool
Disk Configuration Available for new Raid-Z pool Creation
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
Creating New RaidZ Pool
root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t2d0s0 is part of exported or potentially active ZFS pool poolnr. Please see zpool(1M).
==> Here we had an issue with one of the disk we selected for the pool, and the reason is the disk already used by some zpool earlier. But now the old zpool no longer available, and we want to reuse the disk for the new zpool.
==> We can solve the problem by two ways
1. Use -f option to override the configuration
2. Reinitialize the partition table for the disk (Solaris X86 only).
==> In this example I have reinitialized the whole disk as solaris partition with below command
root@solarisbox:~# fdisk -B /dev/rdsk/c3t3d0p0
root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
root@solarisbox:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rzpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
Create File system and Copy some test data to rzpool/r5testfs
root@solarisbox:~# zfs create rzpool/r5testfs
root@solarisbox:/downloads# df -h|grep test
rzpool/r5testfs 5.8G 575M 5.3G 10% /rzpool/r5testfs
root@solarisbox:/downloads# cd /rzpool/r5testfs/
root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r– 1 root root 602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip
root@solarisbox:/rzpool/r5testfs#
After Manual Simulation of the Disk failure ( i.e. c3t7d0)
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0 <<== c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):
Checking the zpool Status – it is in Degraded State
root@solarisbox:~# zpool status -x rzpool
pool: rzpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rzpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 UNAVAIL 0 0 0 cannot open
errors: No known data errors
Checking if the File system is Still Accessible
root@solarisbox:~# df -h |grep testfs
rzpool/r5testfs 5.8G 575M 5.3G 10% /rzpool/r5testfs
root@solarisbox:~# cd /rzpool/r5testfs
root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r– 1 root root 602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip
root@solarisbox:/rzpool/r5testfs#
After replacing the failed disk with new disk, in the same location
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
root@solarisbox:~# zpool status -x
pool: rzpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rzpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 FAULTED 0 0 0 corrupted data <<== this State Changed to Faulted just because the zpool could see the new disk but with no/corrupted data
errors: No known data errors
Replacing the Failed Disk Component in the Zpool
root@solarisbox:~# zpool replace rzpool c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t7d0s0 is part of exported or potentially active ZFS pool mpool. Please see zpool(1M).
root@solarisbox:~# zpool replace -f rzpool c3t7d0 <<== using -f option to override above message
root@solarisbox:~# zpool status -x
all pools are healthy
root@solarisbox:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
NAME STATE READ WRITE CKSUM
rzpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
Two Disk Failures Scenario for RaidZ pool – And it Fails
Zpool Status Before Disk Failure
root@solarisbox:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
NAME STATE READ WRITE CKSUM
rzpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
Disk Configuration After Simulating double disk failure
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0
/pci@0,0/pci8086,2829@d/disk@3,0 <== C3t4d0 & c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):
Zpool Status after the Double Disk Failure
root@solarisbox:~# zpool status -x
pool: rzpool
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rzpool UNAVAIL 0 0 0 insufficient replicas
raidz1-0 UNAVAIL 0 0 0 insufficient replicas
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 UNAVAIL 0 0 0 cannot open
c3t7d0 UNAVAIL 0 0 0 cannot open
Conclusion: /rzpool/r5testfs filesystem not available for usage and the Zpool cannot be recovered from the current status
No comments:
Post a Comment