Configuring ZFS Storage for Zones

The ZFS filesystem built in to Solaris is an amazing piece of technology: combining the functionality of a volume manager and a filesystem, it delivers a pooled storage model with built in snapshots, clones, block-level compression, and clever use of SSDs to accelerate read and write performance. When configured correctly, it complements a zones environment very well.

There are several well-known methods for managing filesystems with Solaris zones: direct mounts, lofs mounts, and (rarely) raw device delegation. These did the job in a UFS and VxFS world, but with ZFS now widely supported and increasingly the filesystem/volume manager of choice, a great option is to delegate ZFS datasets into the zones.

This integration of ZFS and zones allows an administrator to pool storage at the global level and still benefit from flexible filesystem management throughout all of the non-global zones.

Designing ZFS pools

The major design decision is how many different ZFS pools to have on the system (besides the default rpool). In my experience, there are three dimensions in which you may choose to slice and dice your ZFS pools: performance, data protection, and workload isolation. In practice, some of these factors may overlap so your design will have fewer pools than it otherwise would.

Separating pools by performance

If you have certain datasets that require different levels of performance, you may choose to separate them into different storage pools. For example, database archive logs and executables are well suited to low-performance disk in a RAID-5 or RAID-6 configuration, while database datafiles and redo logs will do better with high-performance disk in a RAID-10 configuration.

Note that you may not need to have separate pools to obtain these benefits. The ZFS hybrid storage pool feature allows you to add in different types of disk, especially SSD, to improve read and write performance.

Separating pools for data protection

As much as we want to rely on RAID for data protection, there are certain classes of failure that make it worthwhile to have data stored in multiple locations. For example, backups should be stored in a separate location compared to the production data in order to survive filesystem corruption, volume manager corruption, RAID set failures and data centre failures. Similarly, Oracle redo logs and control files should have multiple copies on physically separate disks.

These requirements may be met by having site-to-site replication or remote tape backups. However, if you're relying on your storage array to protect you against data loss, the least you can do is create a separate ZFS pool on physically separate disk.

Separating pools for workload isolation

It is a good idea to keep sequential workloads separate from random workloads. A small amount of random IO can significantly disrupt sequential performance, so it may make good technical sense to keep those workloads on physically separate disk.

In a business context, you may want to keep your Dev/Test workload separate from your Prod workload. There may also be a requirement to keep different applications physically separate to avoid the finger-pointing that can often occur in the event of performance issues. There are two sides to such isolation though. Too little isolation, and workloads can interfere with each other; too much isolation, and each application can only access a fraction of the potential performance of the underlying storage. The ideal solution would be a QOS mechanism within ZFS, but there is no guaranteed approach to this at the moment.

Note: The Oracle ZFS Storage Array family supports the Oracle Intelligent Storage Protocol, which seems to be some kind of QOS or IO preference mechanism. I'm not sure how this translates to standard ZFS in Solaris.

Bringing it together

My current preferred design for a Solaris zones environment with a large number of Oracle databases looks like this:

A pool for Oracle binaries, archived redo logs, staged backups, and any other application binaries (SATA disk)
A pool for Oracle data files, control files and redo logs (FC disk with separate SSD log devices)
A pool for the second copy of Oracle control files and redo logs (pure SSD)

Dev/Test workloads are on a separate server and separate disk compared to Prod, such that that Prod and Dev/Test workloads cannot interfere with each other.

Integrating with zones

The main drawback of using delegated ZFS datasets is that they cannot be added or removed from zones dynamically (i.e. while a zone is running). It is therefore important to configure the ZFS delegation up-front and in a flexible manner.

The easiest way I've found to do this is to delegate one dataset per pool into each zone. For example, if there are three pools (red, blue, and yellow) and two zones (zone1 and zone2):

zone1 is delegated the following datasets: red/zone1, blue/zone1, yellow/zone1
zone2 is delegated the following datasets: red/zone2, blue/zone2, yellow/zone2

In Solaris 11, each delegated dataset can be given an alias, so for example zone1 and zone2 could each be configured to see their datasets as red, blue, and yellow. This can make configuration more consistent across different environments.

Once the zone is started up, with a dataset from each pool visible, the administrator within the zone can create ZFS filesystems as required by the zone's application. One great thing about ZFS delegation is that even though a system may have thousands of ZFS filesystems, that complexity is hidden because each zone can only see its own filesystems.

Shared filesystems

There may occasionally be a requirement to share the same filesystem between multiple zones. Unfortunately this can't be done with ZFS delegation. Instead, you'll need to configure the filesystem in the global zone and use the old-fashioned zone lofs mount into the zones that need to access it.