Tuesday, March 17, 2015

Solaris: How to Clone LDOM

Solaris ZFS based LDOMs can be cloned either by command or even by script.  This greatly speeds up deployments in an environment where Solaris VMs need to be built quickly. Oracles (formerly Sun Microsystems) LDOM virtualization technology can be automated and greatly simplified when combined with the ZFS filesystem. LDOM's are Sun's spin on VMWare, or at least at best can be roughly compared to VMWare. One difference being that LDOM's assign physical hardware as compared to VMWare where time slices on resources are used in typical configurations. This comparison is based on typical implementations. The LDOM's are based on the CMT processor architectures found in the T4's and T2 type SPARC systems. I won't get into a big description of the technology. Suffice it to say, it is highly configurable. In this blog, I show how from a  simple script, guest LDOM's can be created or cloned very quickly with a fully bootable and running system with an independent kernel and hardware.

Setting up primary control and service domains
In any LDOM configuration, at a minimum a control and service domain is required. The configuration used in this example does not support dual service domains for virtual IO which provides the ultimate in availability.

Create virtual disk service

# ldm add-vds primary-vds0 primary

Create virtual console

#  ldm add-vcc port-range=5000-5100 primary-vcc0 primary

Add virtual switch

      # ldm add-vsw net-dev=e1000g0 primary-vsw0 primary

Create control domain and give it services.  T4-X's don't need the SSL chip settings :), if you aren't using a T4-x system, then set-mau is needed.
# ldm set-mau 0 primary
# ldm set-vcpu 4 primary
# ldm start-reconf primary
# ldm set-memory 4G primary

Adding and setting spconfig to initial
# ldm add-spconfig initial
# shutdown -i6 -g0 now

Configuring the network component enabling networking between control and service domain needs to be done from the console. Otherwise you will blow your knees out from under you. Then it gets awkward 
# ifconfig vsw0 plumb
# ifconfig e1000g0 down unplumb
# fconfig vsw0 147.28.18.28 netmask 0xffffff00 broadcast + up
# mv /etc/hostname.e1000g0 /etc/hostname.vsw0

Enable vntsd
# svcadm enable vntsd
# reboot

Setup ZFS based golden image LDOM
First setup the storage for the LDOM. As an example, lets create a 1 TB zpool (Waaaay better than VxVM & LVM). So I run the zpool create command and feed it the device nodes for the 2 multipathed HDS LUN's.
# zpool create LDOM_disk c0t60060E8006FF03000000FF0300001500d0 c0t60060E8006FF03000000FF0300001501d0
# zpool status LDOM_disk
pool: LDOM_disk
state: ONLINE
scan: none requested
config:
NAME                                     STATE     READ WRITE CKSUM
LDOM_disk                                ONLINE       0     0     0
c0t60060E8006FF03000000FF0300001500d0  ONLINE       0     0     0
c0t60060E8006FF03000000FF0300001501d0  ONLINE       0     0     0
errors: No known data errors
# zfs create -V 15g LDOM_disk/golden

Ok, we've created a golden zfs volume under the LDOM_disk pool. Dam that was easy. SVM or VxVM/LVM as sweet as it used to be, can't compare to ZFS for configuration simplicity. Ok, move on. Lets build an LDOM with our basic configuration. I have a fully loaded T4 with 256 CPU threads, and oceans of silicon for memory. So I'm being generous here. But really, this is just my golden image, so it's not really important what my resources are here, as it won't be used. So I'll give it just enough resources to install and configure a healthy Solaris 10 kernel.
# ldm add-domain golden
# ldm add-vcpu 2 golden
# ldm add-memory 8 G golden
# ldm add-vnet vnet1 primary-vsw0 golden

Add disk to virtual disk server

# ldm add-vdsdev /dev/zvol/dsk/LDOM_disk/golden vol0@primary-vds0

Add disk to golden LDOM

# ldm add-vdisk vdisk0 vol0@primary-vds0 golden

Set autoboot var

# ldm set-var auto-boot?=false golden

Map ISO for boot and installation
# ldm add-vdsdev /root/sol-10-u10-ga2-sparc-dvd.iso iso@primary-vds0
# ldm add-vdisk vdisk_iso iso@primary-vds0 golden
# ldm bind-domain golden
# ldm start-domain golden
# ldm start-domain golden
LDOM golden started

# ldm list-bindings golden
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
golden          active     -t—-  5000    24    32G      4.2%  20s
UUID
8939999e-0d72-e842-a93e-ae86b57c5dc6
MAC
00:14:4f:fb:c5:4f
HOSTID
0x84fbc54f
CONTROL
failure-policy=ignore
extended-mapin-space=off
cpu-arch=native
DEPENDENCY
master=
CORE
CID    CPUSET
1      (8, 9, 10, 11, 12, 13, 14, 15)
2      (16, 17, 18, 19, 20, 21, 22, 23)
3      (24, 25, 26, 27, 28, 29, 30, 31)
VCPU
VID    PID    CID    UTIL STRAND
0      8      1      100%   100%
1      9      1      0.0%   100%
2      10     1      0.0%   100%
3      11     1      0.0%   100%
4      12     1      0.0%   100%
5      13     1      0.0%   100%
6      14     1      0.0%   100%
7      15     1      0.0%   100%
8      16     2      0.0%   100%
9      17     2      0.0%   100%
10     18     2      0.0%   100%
11     19     2      0.0%   100%
12     20     2      0.0%   100%
13     21     2      0.0%   100%
14     22     2      0.0%   100%
15     23     2      0.0%   100%
16     24     3      0.0%   100%
17     25     3      0.0%   100%
18     26     3      0.0%   100%
19     27     3      0.0%   100%
20     28     3      0.0%   100%
21     29     3      0.0%   100%
22     30     3      0.0%   100%
23     31     3      0.0%   100%
MEMORY
RA               PA               SIZE
0x40000000       0x140000000      32G
CONSTRAINT
threading=max-throughput
VARIABLES
auto-boot?=false
NETWORK
NAME             SERVICE                     ID   DEVICE     MAC               MODE   PVID VID
MTU   LINKPROP
vnet1            primary-vsw0@primary        0    network@0  00:14:4f:f9:44:b9        1
1500
PEER                        MAC               MODE   PVID VID                  MTU   LINKPROP
primary-vsw0@primary        00:14:4f:fb:ed:02        1                         1500
DISK
NAME             VOLUME                      TOUT ID   DEVICE  SERVER         MPGROUP
vdisk0           vol0@primary-vds0                0    disk@0  primary
vdisk_iso        iso@primary-vds0                 1    disk@1  primary
VCONS
NAME             SERVICE                     PORT
golden          primary-vcc0@primary        5000

Get yourself onto the console and boot off of the ISO and install the operating system.
# telnet localhost 5000
Trying 127.0.0.1…
Connected to localhost.
Escape character is '^]'.
Connecting to console "golden" in group "golden" ….
Press ~? for control options ..
~
{0} ok


{0} ok devalias
vdisk_iso                /virtual-devices@100/channel-devices@200/disk@1
vdisk0                   /virtual-devices@100/channel-devices@200/disk@0
vnet1                    /virtual-devices@100/channel-devices@200/network@0
net                      /virtual-devices@100/channel-devices@200/network@0
disk                     /virtual-devices@100/channel-devices@200/disk@0
virtual-console          /virtual-devices/console@1
name                     aliases

The vdisk_iso is a device alias that was created in the step above. So you can boot and install off of this device. Use the OBP boot command as shown below.

{0} ok boot vdisk_iso:f

NOTE: jumpstart works great here too. But in this case, I had an iso handy.
Go through install process. Add whatever software you need, DNS settings. Whatever you need for your base configs. I typically just give the LDOM a bogus IP address because I will be changing in once I boot the new guest LDOM created from this golden image. You can also in some cases run a sys_unconfig command and start with a clean slate. It's your choice.
Stop and unbind the guest golden image domain.

# ldm stop golden

Snap shot the disk image.

# zfs snapshot LDOM_disk/golden@golden-image

We have a fully bootable LDOM image with whatever version of the operating system installed. This is perfect. We now use this image to create the future guest LDOM's by cloning the image.

Script to create ZFS based golden image LDOM
Use the following script based on building the LDOMs with the golden image. This of course can be tweaked to change features such as vcup's and memory. The script is basic with no error checking. It uses only 2 positional paramater, as in 2 arguments. The first arg $1 is the LDOM name, second $2 is the volume name for the storage.
# cat ./clone-LDOM.sh
#---------------------- START OF SCRIPT -----------------------
echo Setting up clone for $1 with $2 storage
sleep 2
zfs clone LDOM_disk/golden@golden-image LDOM_disk/$1
echo creating domain
ldm add-domain $1
ldm add-vcpu 2 $1
ldm add-memory 4G $1
echo network
ldm add-vnet vnet1 primary-vsw0 $1
echo storage
ldm add-vdsdev /dev/zvol/dsk/LDOM_disk/$1 ${2}@primary-vds0
echo adding disk to $1 LDOM
ldm add-vdisk vdisk0 ${2}@primary-vds0 $1
echo set autoboot var
ldm set-var auto-boot?=false $1
echo binding
ldm bind-domain $1
ldm start-domain $1
ldm list-domain $1
#---------------------- START OF SCRIPT -----------------------

Assumption is you have many vcpu's and tons of disk. Output of running the script. I called the script c
lone-LDOM.sh.  Lets run the script and create an LDOM and associate a ZFS volume called test-ldm-vol to the LDOM. Kay, lets go:
# ./clone-LDOM.sh test-ldom test-ldm-vol
Setting up clone for test-ldom with test-ldm-vol storage
creating domain
network
storage
adding disk to test-ldom LDOM
set autoboot var
binding
LDOM test-ldom started
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
test-ldom          active     ——  5013    2     4G       0.0%  0s

That was fast. Lets have a look and see how it looks:
# ldm list -e test-ldom
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
test-ldom          active     -t—-  5013    2     4G        50%  8m
SOFTSTATE
OpenBoot Running
UUID
70ba2009-37a1-ec70-a62e-f084b30057ca
MAC
00:14:4f:fb:7d:e7
HOSTID
0x84fb7de7
CONTROL
failure-policy=ignore
extended-mapin-space=off
cpu-arch=native
DEPENDENCY
master=
CORE
CID    CPUSET
14     (112, 113)
VCPU
VID    PID    CID    UTIL STRAND
0      112    14     100%   100%
1      113    14     0.0%   100%
MEMORY
RA               PA               SIZE
0x5f800000       0xe5f800000      4G
CONSTRAINT
threading=max-throughput
VARIABLES
auto-boot?=false
NETWORK
NAME             SERVICE                     ID   DEVICE     MAC               MODE   PVID VID
MTU   LINKPROP
vnet1            primary-vsw0@primary        0    network@0  00:14:4f:fb:f0:89        1
1500
DISK
NAME             VOLUME                      TOUT ID   DEVICE  SERVER         MPGROUP
vdisk0           test-ldm-vol@primary-vds0          0    disk@0  primary
VLDCC
NAME             SERVICE                     DESC
ds               primary-vldc0@primary       domain-services
VCONS
NAME             SERVICE                     PORT
test-ldom          primary-vcc0@primary        5013

Looks great. The virtual console primary-vcc0@primary is on port 5013 meaning this is the 14th LDOM on this system. Using this basic script you can create dozens of LDOMs with the golden image I created in the earlier steps. I would boot the newly created LDOMs to single user mode and change the hostname and ip address before going multiuser. Of course this script can be expanded to feed it memory and cpu values to give it more flexibilty. Nothing stopping you from added more memory and CPU later. Depending on your version of Domain Manager, you can use Dynamic Reconfiguration to change resources on the fly.

Lets boot the LDOM to single user mode. So I need to get to the console which should get me the OBP ok prompt. At which point I will boot it to single user mode and change the hostname and IP address and possibly routing information if need be.
# telnet localhost 5013
Trying 127.0.0.1…
Connected to localhost.
Escape character is '^]'.
Connecting to console "test-ldom" in group "test-ldom" ….
Press ~? for control options ..
{0} ok

{0} ok boot -vs
Boot device: /virtual-devices@100/channel-devices@200/disk@0  File and args: -vs
module /platform/sun4v/kernel/sparcv9/unix: text at [0x1000000, 0x10c1c1d] data at 0x1800000
module /platform/sun4v/kernel/sparcv9/genunix: text at [0x10c1c20, 0x12a6b77] data at 0x1935f40
module /platform/sun4v/kernel/misc/sparcv9/platmod: text at [0x12a6b78, 0x12a6b8f] data at 0x198d598
module /platform/sun4v/kernel/cpu/sparcv9/SPARC-T4: text at [0x12a6b90, 0x12ad04f] data at 0x198dcc0
SunOS Release 5.10 Version Generic_147440-01 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Ethernet address = 0:14:4f:fb:7d:e7
mem = 4194304K (0x100000000)
avail mem = 3821568000
root nexus = SPARC T4-4
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
virtual-device: cnex0
cnex0 is /virtual-devices@100/channel-devices@200
vdisk@0 is online using ldc@14,0
channel-device: vdc0
vdc0 is /virtual-devices@100/channel-devices@200/disk@0
root on rpool/ROOT/s10s_u10wos_17b fstype zfs
pseudo-device: dld0
dld0 is /pseudo/dld@0
cpu0: SPARC-T4 (chipid 0, clock 2998 MHz)
cpu1: SPARC-T4 (chipid 0, clock 2998 MHz)
iscsi0 at root
iscsi0 is /iscsi
Booting to milestone "milestone/single-user:default".
pseudo-device: zfs0
zfs0 is /pseudo/zfs@0
WARNING: vnet0 has duplicate address 010.238.198.220 (in use by 00:14:4f:fa:68:fa); disabled
Nov 30 08:17:45 svc.startd[10]: svc:/network/physical:default: Method "/lib/svc/method/net-physical" fai
led with exit status 96.
Nov 30 08:17:45 svc.startd[10]: network/physical:default misconfigured: transitioned to maintenance (see
'svcs -xv' for details)
Hostname: rocker
pseudo-device: devinfo0
devinfo0 is /pseudo/devinfo@0
pseudo-device: pseudo1
pseudo1 is /pseudo/zconsnex@1
pseudo-device: lockstat0
lockstat0 is /pseudo/lockstat@0
pseudo-device: fcode0
fcode0 is /pseudo/fcode@0
pseudo-device: llc10
llc10 is /pseudo/llc1@0
pseudo-device: lofi0
lofi0 is /pseudo/lofi@0
pseudo-device: trapstat0
trapstat0 is /pseudo/trapstat@0
pseudo-device: fbt0
fbt0 is /pseudo/fbt@0
pseudo-device: profile0
profile0 is /pseudo/profile@0
pseudo-device: systrace0
systrace0 is /pseudo/systrace@0
pseudo-device: sdt0
sdt0 is /pseudo/sdt@0
pseudo-device: fasttrap0
fasttrap0 is /pseudo/fasttrap@0
pseudo-device: ntwdt0
ntwdt0 is /pseudo/ntwdt@0
pseudo-device: mdesc0
mdesc0 is /pseudo/mdesc@0
pseudo-device: ds_snmp0
ds_snmp0 is /pseudo/ds_snmp@0
pseudo-device: ds_pri0
ds_pri0 is /pseudo/ds_pri@0
pseudo-device: bmc0
bmc0 is /pseudo/bmc@0
pseudo-device: fcsm0
fcsm0 is /pseudo/fcsm@0
pseudo-device: fssnap0
fssnap0 is /pseudo/fssnap@0
pseudo-device: winlock0
winlock0 is /pseudo/winlock@0
pseudo-device: vol0
vol0 is /pseudo/vol@0
pseudo-device: pm0
pm0 is /pseudo/pm@0
pseudo-device: pool0
pool0 is /pseudo/pool@0
dump on /dev/zvol/dsk/rpool/dump size 1536 MB
Requesting System Maintenance Mode
SINGLE USER MODE
Root password for system maintenance (control-d to bypass):

Ok, we have logged in to single usermode using the password we configured in the golden image. Lets make some minore changes. These are basic but cleanly boot your system
# echo "23.45.66.111   test-ldom test-ldom.mydomain.com" > /etc/hosts
# echo test-ldom > /etc/hostname.vnet0
# echo test-ldom > /etc/nodename
# hostname test-ldom

Ok, that's enough for a basic setup. Should boot multi-user happily and let us install…. whatever you want to run.
^D
# svc.startd: Returning to milestone all.
pseudo-device: drctl0
drctl0 is /pseudo/drctl@0
pseudo-device: ramdisk1024
ramdisk1024 is /pseudo/ramdisk@1024
pseudo-device: dtrace0
dtrace0 is /pseudo/dtrace@0
pseudo-device: fcp0
fcp0 is /pseudo/fcp@0
Nov 30 08:22:23 ldmad: agent agent-device registered
Nov 30 08:22:23 ldmad: agent agent-system registered
Nov 30 08:22:23 ldmad: agent agent-dio registered
syslogd: line 24: WARNING: loghost could not be resolved
test-ldom console login: root
Password:
Nov 30 08:22:36 test-ldom pseudo: pseudo-device: devinfo0
Nov 30 08:22:36 test-ldom genunix: devinfo0 is /pseudo/devinfo@0
Nov 30 08:22:36 test-ldom login: ROOT LOGIN /dev/console
Last login: Thu Nov 29 13:46:20 on console
Oracle Corporation      SunOS 5.10      Generic Patch   January 2005
#

From here, you can add more disk, 300 IP addresses, even build branded zones if you need to. God forbid. And if you are really keen, you can do this:

As I said, this script can easily be modified to set the number of CPU's or memory settings as well. Through dynamic reconfiguration, this can be done on the fly later on too.

2 comments:

  1. How do I clone an ldom with and its non-root pools from one control domain to another? I can use flarcreate for the zfs rpool and zfs send| ssh zfs recv... I guess?

    ReplyDelete
    Replies
    1. my suggestion,
      for rpool: use flarcreate.
      for non-root zpools (or data pools), if both the controldoms are connected to the same SAN storage - simply clone/mirror the disks and assign the cloned/mirrored disks to the new LDOM.

      zfs send/recieve is the only way if both SAN storages are different and cannot be made available to both the controldoms at the time.

      Hope it clarifies,
      Cheers,

      Delete