Saturday, January 16, 2016

Solaris/VxVM–Recover Failed Disk when vxreattach fails

In the event that the paths to a given device suffer a (transient) failure with the SAN (array), the VxVM device will be reported as "failed was" by vxdisk list.

Solaris example

Unless the device is accessible via the OS device handle, VxVM will not be in a position to reattach the disk.

 

# prtvtoc /dev/vx/rdmp/emc0_0281
prtvtoc: /dev/vx/rdmp/emc0_0281: No such device or address



NOTE: At this time the disk access name (emc0_281) is not accessible via the O/S and the corresponding paths have been disabled by DMP:

 

# vxdmpadm getsubpaths dmpnodename=emc0_0281
NAME         STATE[A]   PATH-TYPE[M] CTLR-NAME  ENCLR-TYPE   ENCLR-NAME    ATTRS
================================================================================
c1t5006048C5368E580d334s2  DISABLED(M)    -          c1         EMC          emc0             -
c1t5006048C5368E5A0d325s2  DISABLED(M)    -          c1         EMC          emc0             -

 


When the impacted device is online once again and the O/S can communicate (without issue) with the underlying paths, the disk can normally be recovered back into the impacted diskgroup using the Veritas Volume Manager (VxVM) command "vxreattach".

 

# vxdmpadm getsubpaths dmpnodename=emc0_0281
NAME         STATE[A]   PATH-TYPE[M] CTLR-NAME  ENCLR-TYPE   ENCLR-NAME    ATTRS
================================================================================
c1t5006048C5368E580d334s2  ENABLED(A)    -          c1         EMC          emc0             -
c1t5006048C5368E5A0d325s2  ENABLED(A)    -          c1         EMC          emc0             -

# prtvtoc -s /dev/vx/rdmp/emc0_0281
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       2      5    01          0   4103040   4103039
       7     15    01          0   4103040   4103039



vxreattach

In the event that the "vxreattach" command is unable to associate the disk access (da) name back into the impacted diskgroup, another approach is required.

 

# vxreattach -c <Veritas-Disk-Access-Name>
VxVM vxdisk ERROR V-5-1-558 Disk <Veritas-Disk-Access-Name>: Disk not in the configuration
 

 


The purpose of the "vxreattach" command is to reattach disk drives that have once again become accessible.

The vxreattach utility reattaches (recovers) disks back into the impacted diskgroup they were associated with, retaining the same disk media name.
The utility attempts to locate a disk in the same diskgroup with the same Veritas disk ID for the disk to be reattached.
The reattach operation may fail even after locating the disk with the corresponding disk ID, if the original case (or some other cause) for the disk failure still exists.

 

Scenario


Diskgroup "testdg" fails to import initially due to failed disk.

 

# vxdg import testdg
VxVM vxdg ERROR V-5-1-10978 Disk group testdg: import failed:
Disk for disk group not found



# vxdg -f import testdg
VxVM vxdg WARNING V-5-1-560 Disk emc0_0281: Not found, last known location: emc0_0281

# vxdisk -eg testdg list
DEVICE       TYPE           DISK        GROUP        STATUS               OS_NATIVE_NAME   ATTR
emc0_0280    auto:sliced    emc1_0280    testdg      online               c1t5006048C5368E5A0d324s2 std
-            -         emc0_0281    testdg       failed was:emc0_0281


 

In this instance it is not possible to use "vxreattach" to recover the failed disk back into the diskgroup "testdg".

 

# vxreattach -c emc0_0281
VxVM vxdisk ERROR V-5-1-558 Disk emc0_0281: Disk not in the configuration



Recovery procedure


To obtain the Veritas diskgroup (dg) id from the impacted diskgroup, type:
 

# vxdg -q list | grep testdg
testdg       enabled              1311240633.41.dopey

 


To obtain the disk attribute from the diskgroup configuration database, type:

# vxprint -g testdg -dF'%last_da_name %name %diskid'
emc0_0281 emc0_0281 1312382118.46.dopey
emc0_0280 emc1_0280 1311240596.39.dopey

 


To cross match the above diskid "1312382118.46.dopey" for the impacted Veritas disk access name "emc0_281" from the diskgroup configuration database to that of the VxVM kernel disk content, type:

Syntax:


# vxdisk -x DISKID -x DGID -p list | grep <dgid>



In this instance emc0_0281 has the same diskid reported from both outputs. Thus confirming that it is the same disk.
 

# vxdisk -x DISKID -x DGID -p list | grep 1311240633.41.dopey
emc0_0280    1311240596.39.dopey 1311240633.41.dopey
emc0_0281    1312382118.46.dopey 1311240633.41.dopey
   
<<<<< this is the failed disk



By clearing the diskgroup name the disk can be associated back into the diskgroup.
 

# vxdisk list emc0_0281
Device:    emc0_0281
devicetag: emc0_0281
type:      auto
hostid:    dopey
disk:      name= id=1312382118.46.dopey
group:     name=testdg id=1311240633.41.dopey
       
<<<<<<<<<< clear "diskgroup name"
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig autoimport
pubpaths:  block=/dev/vx/dmp/emc0_0281s2 char=/dev/vx/rdmp/emc0_0281s2
guid:      {cf7741f4-bddd-11e0-bccf-0003baa707e3}
udid:      EMC%5FSYMMETRIX%5F000290300822%5F2200281000
site:      -
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=65792 len=4037248 disk_offset=0
private:   slice=2 offset=256 len=65536
disk_offset=0  <<<<<<<<<<<
update:    time=1312385647 seqno=0.13
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=48144
logs:      count=1 len=7296
Defined regions:
config   priv 000048-000239[000192]: copy=01 offset=000000 enabled
config   priv 000256-048207[047952]: copy=01 offset=000192 enabled
log      priv 048208-055503[007296]: copy=01 offset=000000 enabled
lockrgn  priv 055504-055647[000144]: part=00 offset=000000
Multipathing information:
numpaths:   2
c1t5006048C5368E580d334s2       state=enabled
c1t5006048C5368E5A0d325s2       state=enabled

 


To clear the diskgroup name from the VxVM disk headers, type:

 

# /etc/vx/diag.d/vxprivutil set /dev/vx/rdmp/emc0_0281s2 dgname=""

 

 

NOTE: "s2" following "emc0_0281" happens to be the private region slice for the Veritas disk access (da) name "emc0_0281".

 

# vxdisk list emc0_0281 | grep group
group:     name= id=1311240633.41.dopey

 


Using the "-k" option with vxdg adddisk, it is possible to associate the failed disk back into the impacted diskgroup using the existing disk media (dm) name and corresponding da name:

 

# vxdg -g testdg -k adddisk emc0_0281=emc0_0281

# vxdisk -eg testdg list
DEVICE       TYPE           DISK        GROUP        STATUS               OS_NATIVE_NAME   ATTR
emc0_0280    auto:sliced    emc1_0280    testdg      online               c1t5006048C5368E5A0d324s2 std
emc0_0281    auto:cdsdisk   emc0_0281    testdg      online               c1t5006048C5368E580d334s2 std


Now that the disk has been recovered back into the diskgroup, the volumes can then be recovered.

No comments:

Post a Comment