Tuesday, January 27, 2015

Solaris SVM: Recovery procedures when BOTH sides of the mirror indicate a"Last Erred" state


As a result of unusual failures such as multiple power failures, Solstice DiskSuite /Solaris  Volume Manager both submirrors of a metadevice mirror may be left in an unusual "Last Erred / Last Erred" state making it impossible to determine exactly which submirror must be fixed/replaced first in order to protect the data stored on the metadevice mirror.

Instructions:
The following are two examples of DiskSuite/Solaris Volume Manager metadevice mirrors with both submirrors indicating that components are in the "Last Erred" state.

Normal DiskSuite/Solaris Volume Manager recovery procedures indicate that the submirror in "Maintenance" state must be fixed BEFORE the submirror in "Last Erred" state. In these examples, it is impossible to determine which submirror must be fixed first to protect data.

The first example below is the metastat output from a simple mirrored metadevice made up of two submirrors built upon single slices of a physical disk, The second example below is the metastat output from a mirrored metadevice made up of two submirrors comprised of a stripe of three physical components.
Although the metadevices are slightly different, in both cases, the attempted recovery prodcedure is exactly the same. Details of the procedure follow the examples below.

EXAMPLE 1 – metastat of a simple mirrored metadevice:
Note in this example, both submirrors are in a "Last Errd" state making normal recovery procedures impossible.

d14: Mirror
Submirror 0: d15 State: Needs maintenance
Submirror 1: d16 State: Needs maintenance
Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 46137984 blocks (22 GB)

d15: Submirror of d14 State: Needs maintenance 
Invoke: after replacing "Maintenance" components: metareplace d14 c3t32d0s0
Size: 46137984 blocks (22 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare
    c3t32d0s0 0 No Last Erred Yes

d16: Submirror of d14 State: Needs maintenance 
Invoke: after replacing "Maintenance" components: metareplace d14 c3t33d0s0
Size: 46137984 blocks (22 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare
    c3t33d0s0 0 No Last Erred Yes

EXAMPLE 2 – metastat of a striped mirrored metadevice:
Note in this example that the 3 stripes from one submirror and 1 stripe from the second submirror are in "Last Erred" state making normal recovery procedures impossible.

d6: Mirror
Submirror 0: d41 State: Needs maintenance
Submirror 1: d42 State: Needs maintenance
Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 49132024 blocks (23 GB)

d41: Submirror of d6 State: Needs maintenance 
Invoke: after replacing "Maintenance" components: metareplace d6 c3t50d0s1
Size: 49132024 blocks (23 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c3t50d0s1 0 No Last Erred Yes
Stripe 1: Device Start Block Dbase State Reloc Hot Spare
c3t35d0s0 10176 No Last Erred Yes
Stripe 2: Device Start Block Dbase State Reloc Hot Spare c3t54d0s4 0 No Last Erred Yes

d42: Submirror of d6 State: Needs maintenance
Invoke: after replacing "Maintenance" components: metareplace d6 c3t57d0s4
Size: 49132024 blocks (23 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare
    c3t51d0s1 0 No Okay Yes
Stripe 1: Device Start Block Dbase State Reloc Hot Spare
    c3t36d0s0 10176 No Okay Yes
Stripe 2: Device Start Block Dbase State Reloc Hot Spare
    c3t57d0s4 0 No Last Erred Yes

STEPS FOR ATTEMPTED RECOVERY
LOCATE BACKUP TAPES and have them available in the event that the metadevices cannot be recovered. There is no guarantee that the procedures presented below will result in data recovery.
In this case, the *safest* means for attempting recovery is to unmount and clear each mirror leaving only the submirrors.

Fsck and mount each submirror to validate the data.

Determine which submirrors *if any* are valid. The goal is to locate 1 good submirror from each metadevice. Once the good submirror has been located, recreate the mirror using the good submirror. Attach the remaining submirror and mount the mirror to its original mount point.

Below is an example of the procedure used to recover metadevice d6  (the procedure is identical for metadevice d14)

# umount /dev/md/dsk/d6 # metaclear -f d6 # fsck /dev/md/rdsk/d41 # fsck /dev/md/rdsk/d42 # mount /dev/md/dsk/d41 /
** verify data at this point **

If data is determined to be valid, recreate the mirror metadevice using this submirror

# umount /dev/md/rdsk/d41 # metainit d6 -m d41

attach second submirror and mount metadevice

# metattach d6 d42 # mount /dev/md/dsk/d6 /

If data is determined to be INVALID, unmount first submirror, mount second submirror and attempt to validate this data:

# umount /dev/md/dsk/d41 # mount /dev/md/dsk/d42

** verify data at this point **

If data is determined to be valid, recreate the mirror metadevice using this submirror:

# umount /dev/md/dsk/d42 # metainit d6 -m d42

attach remaining submirror and mount the metadevice

# metattach d6 d41 # mount /dev/md/dsk/d6 /

NOTE: *** IF BOTH submirrors are invalid, there is nothing more that can be done, a data restore will be necessary. ***

No comments:

Post a Comment