Tuesday, March 17, 2015

Solaris/LDOMs: using PCIe direct IO with LDOMs


Recent versions of logical domains (or Oracle VM for SPARC) allow you to assign single PCIe devices to a guest LDOM so IO from that LDOM does not have to go through the primary domain. I am setting this up for 2 FC HBA on a T4-2 system with two domains, one for prod and one for test. Assigning DIO devices to a guest domain (which then becomes an IO-Domain) will prevent you from doing live migration of this domain and it will also provide a new dependency to the primary domain because if the primary goes down or reboots, so does the PCI bus and with it the access to the HBA. But since we also boot from a ZFS provided by the primary domain, this dependency was already there as well. Another option would be to assign a whole PCIe bus to a guest domain (making it a so-called root domain) but extra caution needs to be taken if the primary domains boots from a disk controller attached to the PCIe bus to be shared. And some more thought needs to be put into your networking configuration as well.

The whole process is documented well, this post basically repeats the steps that I have taken and adds the multipath configuration from the guest domain.
The first step is to identify the device names of these FC adapters using ldm list-io from the primary domain (abbreviated output below).

root@primary:~# ldm list-io -l
NAME TYPE BUS DOMAIN STATUS
---- ---- --- ------ ------
pci_0 BUS pci_0 primary
[pci@400]
niu_0 NIU niu_0 primary
[niu@480]
pci_1 BUS pci_1 primary
[pci@500]
niu_1 NIU niu_1 primary
[niu@580]
/SYS/MB/PCIE0 PCIE pci_0 primary OCC
[pci@400/pci@2/pci@0/pci@8]
SUNW,qlc@0/fp/disk
SUNW,qlc@0/fp@0,0
SUNW,qlc@0,1/fp/disk
SUNW,qlc@0,1/fp@0,0
/SYS/MB/PCIE1 PCIE pci_1 primary OCC
[pci@500/pci@2/pci@0/pci@a]
SUNW,qlc@0/fp/disk
SUNW,qlc@0/fp@0,0
SUNW,qlc@0,1/fp/disk
SUNW,qlc@0,1/fp@0,0

So in my case, this is /SYS/MB/PCI0 and /SYS/MB/PCI1 of both PCI busses. So next we'll enable IO virtualization on both busses and remove the devices from the primary LDOM. The primary LDOM will need to be rebooted after this.


root@primary:~# ldm start-reconf primary
Initiating a delayed reconfiguration operation on the primary domain.
All configuration changes for other domains are disabled until the primary
domain reboots, at which time the new configuration for the primary domain
will also take effect.

 
root@primary:~# ldm set-io iov=on pci_0
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

 
root@primary:~# ldm set-io iov=on pci_1
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

 
root@primary:~# ldm remove-io /SYS/MB/PCIE0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

 
root@primary:~# ldm remove-io /SYS/MB/PCIE1 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

 
root@primary:~# reboot -- -r
After the reboot the device(s) will show up as unassigned.

 
root@priamry:~# ldm list-io -l /SYS/MB/PCIE0
NAME TYPE BUS DOMAIN STATUS
---- ---- --- ------ ------
/SYS/MB/PCIE0 PCIE pci_0 OCC
[pci@400/pci@2/pci@0/pci@8]
SUNW,assigned-device@0
SUNW,assigned-device@0,1

And we can now assign these devices to the guest domains. They need to be stopped first (test was not installed at this point). The last steps sets up the dependency relationship to the primary LDOM so that the guests are also reset if the primary reboots.

root@primary:~# ldm stop-domain LDOM-prod
LDOM LDOM-prod stopped

 
root@primary:~# ldm stop-domain LDOM-test
Remote graceful shutdown or reboot capability is not available on LDOM-test
LDOM LDOM-test stopped

 
root@primary:~# ldm add-io /SYS/MB/PCIE0 LDOM-prod
root@primary:~# ldm add-io /SYS/MB/PCIE1 LDOM-test
root@primary:~# ldm set-domain failure-policy=reset primary
root@primary:~# ldm set-domain master=primary LDOM-prod
root@primary:~# ldm set-domain master=primary LDOM-test

Last step is to boot the guest back up, verify that the device is available there and set up multipathing.


root@primary:~# ldm start-domain LDOM-prod
LDOM LDOM-prod started

 
root@primary:~# telnet localhost 5001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

 
Connecting to console "LDOM-prod" in group "LDOM-prod" ....
Press ~? for control options ..
[... console login ...]

 
root@LDOM-prod:~# prtdiag -v
System Configuration: Oracle Corporation sun4v SPARC T4-2
Memory size: 204800 Megabytes
================================ Virtual CPUs ================================
CPU ID Frequency Implementation Status
------ --------- ---------------------- -------
0 2848 MHz SPARC-T4 on-line
1 2848 MHz SPARC-T4 on-line
2 2848 MHz SPARC-T4 on-line
3 2848 MHz SPARC-T4 on-line
================================ IO Devices ================================
Slot + Bus Name + Model Speed
Status Type Path
----------------------------------------------------------------------------
PCIE0 PCIE SUNW,qlc-pciex1077,2532 QLE2562 5.0GTx4
/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0
PCIE0 PCIE SUNW,qlc-pciex1077,2532 QLE2562 5.0GTx4
/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0,1

 
root@LDOM-prod:~# stmsboot -e
WARNING: stmsboot operates on each supported multipath-capable controller
detected in a host. In your system, these controllers are

 
/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0/fp@0,0
/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0

 
If you do NOT wish to operate on these controllers, please quit stmsboot
and re-invoke with -D { fp | mpt | mpt_sas | pmcs} to specify which controllers you wish
to modify your multipathing configuration for.

 
Do you wish to continue? [y/n] (default: y) y
WARNING: This operation will require a reboot.
Do you want to continue ? [y/n] (default: y) y
The changes will come into effect after rebooting the system.
Reboot the system now ? [y/n] (default: y) y

And after that we can use the FC HBA directly from our LDOM with multipathing.

No comments:

Post a Comment