LDoms Disk Configuration

Details: Written by Tom Shaw; Parent Category: Knowledge Base; Category: Oracle VM for SPARC (LDoms)

The Oracle documentation on configuring virtual disk devices provides instructions on several ways to configure vdisks for your guest ldoms, but unfortunately doesn't tell you about the severe performance issues you can encounter from following those instructions.

My strong recommendation is: do not use files or volumes as the backend devices for your virtual disk. You will get much better performance by passing whole disk devices through from the service domain into the guest domains.

The only time you should use a volume as your virtual disk backend is in a "play" environment: for example, here are some instructions on using ZFS volumes to clone LDOM disk images (strangely enough, the author and I are both called Tom Shaw -- no relation!). This could be useful if you want to practice certain procedures and iterate quickly. However, if you ever plan to move the environment to production you should migrate onto whole disk vdisks.

Here's why you shouldn't use ZFS volumes as virtual disk backends:

Forced synchronous writes. On the initial release of LDoms 1.0, performance of file-backed and volume-backed vdisks was adequate, but occasionally saw data loss and pool corruption due to caching when the service domain failed suddenly. This prompted the following bug report: "Bug 6684721 file backed virtual I/O should be synchronous". The solution was to make all file-backed virtual I/O (including ZFS volumes) synchronous. "Synchronous" means that the underlying application must wait for a successful response to every write before continuing. This kills performance.
ZIL write inflation. With synchronous writes in ZFS, data is written twice: first to the ZIL (ZFS Intent Log) in order to respond as fast as possible, and then to the final resting place on disk. With two layers of ZFS, there are two layers of ZIL - so a single write gets converted into 4 physical writes to disk.
Block-size read/write inflation. The default block size for ZVOLs is 8kB. When the underlying ZFS filesystem has the default 128kB block size, a single read or write at the guest level gets converted into 16 8kB reads or writes. There is additional overhead throughout the storage stack from this inflation.
Block misalignment read/write inflation. Depending on how the guest vdisks are sliced, the existing guest pools may not align correctly with the 8kB boundaries of the ZVOLs. With an 8kB filesystem block size (which is recommended for Oracle data), misaligned reads or writes are inflated 2x at the primary domain.
IO queue bottlenecks. There are a number of different IO queues in the stack: within ZFS, within the server's HBA driver (per HBA and per LUN), within the array controller (per controller, per port and per LUN), and within the individual disks. The rule for IO queues is you can increase performance by having more LUNs, and you can get more predictable performance by separating workloads onto different LUNs, as long as you don't blow past the per HBA or per controller limits. The recommendation for ZFS in particular is to have one LUN per underlying physical disk. With ZFS volumes as virtual disk backends, there are probably too few LUNs for maximum performance.
Fragmentation. ZFS is a copy-on-write filesystem. Any writes at the guest level, even temporary writes such as the ZIL, will get "baked in" with regular writes in the ZVOL level. The resulting fragmentation can cause severe performance problems for sequential reads (e.g. backups).

If you want the flexibility of sharing a ZFS pool across multiple virtual machines, you should consider using zones instead of ldoms.

LDoms Disk Configuration

Main Menu