Wednesday, March 16, 2016

Solaris / SF12K-15K: Domain Stop analysis using Redx tools

This one is pretty old. SF12K/15K required special tools to analyze domain crash.

A software error on one domain (such as a heartbeat failure, panic timeout, or error-reset) can cause another domain to DStop on Sun Fire 15K systems running SMS 1.1.  The manifestation of this issue may cause the POST running on one domain to Dstop all other running domains.
While the occurrence is rare, the impact is platform wide.  Depending upon domain configuration and applications, down time can be several hours.  This problem is intermittent and may be related to a domain sync operation on the centerplane (reset of unused ports).

Running POST on one domain means that the power-on self-tests are executed on any domain in the system.  This is done to initially bring a domain online, a DR attach of a board (not currently supported), or a recovery action performed by the SMS software to get a domain back up and running after a reboot, panic, or Dstop.

A message in the platform message log (/var/opt/SUNWSMS/adm/platform/messages) would report:

    Jan 17 20:25:55 2002 swmtft901 hwad[22514]: [1156 1693005732870614 ERR
    InterruptHandler.cc 2127] Domain Stop interrupt detected, domain XXX
              
SMS then creates a Dstop dump file in /var/opt/SUNWSMS/adm/[XXX]/dump.
The file name is dsmd.dstop.YYMMDD.hhmm.ss (for this example).  If this dump file is opened with "redx" and the "wfail" command is issued, the output below is reported.  For example:

        sc% redx -cl
        redx> dumpf load dsmd.dstop.020117.2025.55)
        redx> wfail
        ...ouptut below...           

The Dstop signature of this issue is as follows:

        SDI EX03/S0  Master_Stop_Status0[31:0] = 7004004F
              MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
        SDI EX03/S0  Dstop0[31:0] = 12018200
              Dstop0[16]: D    DARB texp requests all Dstop (M)  
              Dstop0[25]: D 1E AXQ requests all Dstop (M)
              Dstop0[28]: D    Slot0 asserted Error, enabled to cause Dstop (M)
        AXQ EX03 ( 3) Error_Flag_02[31:0] = 04008400  Mask = 0000FFFF
              Err2[26]: D 1E AMX 0-3 hs flow control didn't arrive simultaneously 
        FAIL EXB EX3:  Dstop/Rstop detected by AXQ.
        Primary service FRU is EXB EX3.
        SDI EX04/S0  Master_Stop_Status0[31:0] = 0004000F
              MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
        SDI EX04/S0  Dstop0[31:0] = 02018200
              Dstop0[16]: D    DARB texp requests all Dstop (M)
              Dstop0[25]: D 1E AXQ requests all Dstop (M)
        AXQ EX04 ( 4) Error_Flag_03[31:0] = 30009000  Mask = 21005EFF
              Err3[28]: D 1E AMX data ECC uncorrectable error           
              Err3[29]: R    AMX data ECC correctable error     
        FAIL EXB EX4:  Dstop/Rstop detected by AXQ.
        Primary service FRU is EXB EX4.
       
The AMX flow control error shown above is the key message.  The system will recover automatically via ASR (automatic system recovery).  After recording the Dstop information, SMS restarts the domain(s).       
       
Any SMS 1.1 installations without patch 112080 or later installed are susceptible to this problem.  SMS 1.2 and higher are not affected by this issue.

The true cause of the problem is the AMX ASIC which doesn't handle port resets correctly.  The bug fix changes how POST performs the reset to ensure it's done safely.                  

A Dstop, or Domain Stop, occurs when the hardware detects an unrecoverable error.  The ASICs in the system cease processing transactions as quickly as possible to prevent further corruption of data and facilitate debugging.  It also occurs during the centerplane reset of ports.  The AMX has a problem with the reset of ports not done under domain sync.  Changing the reset so that it is done under domain sync causes the problem to go away. 

Solaris: locks & lockstat

 

Locking

Locks are used to protect critical data structures from simultaneous update access. Locks allows you to update one or a group of variables. Locks can also provide a synchronization mechanism for threads.

 

Locks are used when threads share “writable data”. However, frequent locking is required when a program writes data. There are several locking mechanism available to manage different locking situations.

Sr. #

Lock

Description

1.        

Mutex

Ensures only one thread can lock a resource at a time. No other thread can lock the resource until the previous thread releases the resource. A thread performing a read, reads the data that the locking thread last stored. The state of locking thread must be visible to all processors on a multiprocessor system.

 

 

 

2.        

Conditional Variable

Associated with a condition. Condition Variables remain in force as long as the condition exists.

 

 

 

3.        

Counting Semaphore

Control multiple thread access to a resource. When a thread acquires the resource, the value of the semaphore is decremented.

 

 

 

4.        

Multiple-reader,

single-writer

Enables several threads to access data. The reader group and the thread that wants to write the data owns the lock.

 

A poor locking design that protects too much data by using one lock can create performance problems by serializing the threads for a long time. To solve locking problems, you must significantly “re-design” the program.

Locking problems:

Performance problems that are related to locking are difficult to identify and usually appear as programs running slower than expected for no obvious reasons.

The following programming errors cause problems with locking:

a.       Contention caused by inappropriate use of locking mechanisms

b.      A deadlock caused by two competing operations

c.       A lock control that is “lost” to the process

d.      A race for resources, which is similar to deadlock situation.

e.      An incomplete implementation of the lock control within the program.

 

“Lockstat” is the tool used to identify locking problems in an application.

 

Example:

# lockstat -H -D 10 sleep 1

 

Adaptive mutex spin: 513 events

Count indv cuml rcnt     nsec Lock                Caller

-------------------------------------------------------------------------

  480   5%   5% 1.00     1136 0x300007718e8       putnext+0x40

  286   3%   9% 1.00      666 0x3000077b430       getf+0xd8

  271   3%  12% 1.00      537 0x3000077b430       msgio32+0x2fc

  270   3%  15% 1.00     3670 0x300007718e8       strgetmsg+0x3d4

  270   3%  18% 1.00     1016 0x300007c38b0       getq_noenab+0x200

  264   3%  20% 1.00     1649 0x300007718e8       strgetmsg+0xa70

  216   2%  23% 1.00     6251 tcp_mi_lock         tcp_snmp_get+0xfc

  206   2%  25% 1.00      602 thread_free_lock    clock+0x250

  138   2%  27% 1.00      485 0x300007c3998       putnext+0xb8

  138   2%  28% 1.00     3706 0x300007718e8       strrput+0x5b8

-------------------------------------------------------------------------

[...]

 

Explanation:

In the above example, the lockstat command monitors the locks user by the “sleep” command. Within its own process as well as the locks obtained for the processes in the kernel. These locks are acquired during system call processing.

Large “indv” with a slow or fast locking time helps in identifying the problem. A module is identified that is causing a problem by referring to the lock name, which is usually prefixed with the module name.

 

Note: the “indv” column shows current event as a percentage of all events.

Solaris/Zone: Virtualized Clocks/TimeZones

Oracle Solaris Native Zones now have virtualized clocks to support applications that need to run in different times or to test specific time-related scenarios, for example, how an environment might respond to a leap second.

You can set time values in non-global zones that are different from the value in the global zone. The ability to set different time values in non-global zones is still dependent on the time changes in the global zone. If you change the time in the global zone, the non-global zone time is offset by the same amount.

For example, to set the time value in a non-global zone:

# zonecfg -z myzone

zonecfg:myzone> set limitpriv=default,sys_time

zonecfg:myzone> set global-time=false

zonecfg:myzone> exit

Solaris: Adding different types of raw volumes and disk devices to non-global zones (NGZ)

Adding file system or disk devices to a non-global zone is an integral part of creating a zone. We can add different types of file systems, raw devices and disk devices as well to a non-global zone. The post describes one of the most common ways of adding different file systems, raw and disk devices to a non-global zone.

Adding a Raw Disk device

We can either add a slice or a complete raw disk to the non-global zone. In case of a full disk use s2 slice or else use any other slice you want to add.

global # zonecfg -z zone01

zonecfg:zone01> add device

zonecfg:zone01:device> set match=/dev/rdsk/c0t0d0s6

zonecfg:zone01:device> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01>exit

Adding a VxFS file system

1. Adding a VxVM file system

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set type=vxfs

zonecfg:zone01:fs> set special=/dev/vx/dsk/datadg/datavol

zonecfg:zone01:fs> set raw=/dev/vx/rdsk/datadg/datavol

zonecfg:zone01:fs> set dir=/data

zonecfg:zone01:fs> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

2. Adding a VxVM raw volume

global# zonecfg -z zone01

zonecfg:zone01> add device

zonecfg:zone01:device> set match=/dev/vx/rdsk/dg1/vol1

zonecfg:zone01:device> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

Adding UFS file system

1. Adding UFS under SVM

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set dir=/u01

zonecfg:zone01:fs> set special=/dev/md/dsk/d100

zonecfg:zone01:fs> set raw=/dev/md/rdsk/d100

zonecfg:zone01:fs> set type=ufs

zonecfg:zone01:fs> add options [nodevices,logging]

zonecfg:zone01:fs> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

2. Adding UFS under VxVM volume
We can also create a UFS file system on a VxVM volume as follows.

global # vxassist -g datadg make datavol 1g

global # mkfs -F ufs /dev/vx/rdsk/datadg/datavol

global # mount -F ufs /dev/vx/dsk/datadg/datavol /zones/zone01/root/data

To add the UFS under VxVM :

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set type=ufs

zonecfg:zone01:fs> set special=/dev/vx/dsk/datadg/datavol

zonecfg:zone01:fs> set raw=/dev/vx/rdsk/datadg/datavol

zonecfg:zone01:fs> set dir=/zones/zone1/root/data

zonecfg:zone01:fs> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

Adding ZFS

1. Adding a ZFS file system to a non-global zone – Make sure the mount point property of the ZFS file system getting added to a zone is set to legacy, otherwise it may get assigned to multiple non-global zones simultaneously.

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set type=zfs

zonecfg:zone01:fs> set special=rpool/data

zonecfg:zone01:fs> set dir=/data

zonecfg:zone01:fs> end

zonecfg:zone01> verify

zonecfg:zone01> commit

zonecfg:zone01> exit

2. Adding ZFS file system as a loopback file system (lofs) to a non-global zone :

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set special=rpool/data

zonecfg:zone01:fs> set dir=/data

zonecfg:zone01:fs> set type=lofs

zonecfg:zone01:fs> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

global # mkdir -p /zoneroot/zone01/root/data

global # mount -F lofs rpool/data /zoneroot/zone01/root/data

3. Delegating a dataset to a non-global zone – here you have a complete control of the dataset you delegate to the non-global zone. For example you can create your own child datasets under the dataset you delegate and set properties of the delegated dataset etc. The ZFS file system data will be available as a pool in the non-global zone.

global # zonecfg -z zone01

zonecfg:zone01> add dataset

zonecfg:zone01:dataset> set name=rpool/data

zonecfg:zone01:dataset> end

zonecfg:zone01> commit

zonecfg:zone01> verify

zonecfg:zone01> exit

3. Adding ZFS volumes to non-global zones

global # zonecfg -z zone01

zonecfg:zone1> add device

zonecfg:zone1:device> set match=/dev/zvol/dsk/rpool/datavol

zonecfg:zone1:device> end

zonecfg:zone1> verify

zonecfg:zone1> commit

zonecfg:zone1> exit

Adding a CD-ROM to non-global zone

To add a CD-ROM to the non-global zone :

global # zonecfg -z zone01

zonecfg:zone01> add fs

zonecfg:zone01:fs> set dir=/cdrom

zonecfg:zone01:fs> set special=/cdrom

zonecfg:zone01:fs> set type=lofs

zonecfg:zone01:fs> end

zonecfg:zone01> verify

zonecfg:zone01> commit

zonecfg:zone01> exit