In my previous post, I discussed getting ZFS up and running. I'll assume you've followed those instructions and have a pool called dyskWorld
available. (If you've rebooted since you originally created it, you might need to zpool destroy dyskWorld
and start again, since the /tmp
filing system will have been cleaned.)
Now that we've got a pool, what can we do? Well, a pool contains one-to-many file systems. Each file system may be given its own set of properties (quotas etc.) mounted, unmounted and so forth. In fact, a pool is merely a container for file systems; all the interesting stuff you do at the file system level.
At this point, it's worth noting that other uses of the word 'file system' more commonly imply an on-disk structure that can't be nested. That's not really the same in ZFS. A better analogy might be directories, in that directories can be nested and can have different permissions. Furthermore, because ZFS is fairly fluid, one can create file systems on the fly and decommission them on the fly fairly quickly, and with minimal overhead. So in a traditional Unix install where there may be a single file system for everything (or in a more structured install, a separate partition for home
and a separate partition for var
etc.), ZFS allows you to go to town and create a new file system per user if you want. Think of a ZFS file system as 'subset of data' and a ZFS pool as where the ZFS file systems are stored, and you're getting the right feel for what it is.
Back to our pool. At the moment, we've only got the root file system which we've been creating data on. We can do much better than that. Let's say we want to create spaces for different places on our dyskWorld
pool:
apple[~] zfs create dyskWorld/AnkhMorpork apple[~] zfs create dyskWorld/Pseudopolis apple[~] zfs create dyskWorld/Quirm apple[~] zfs create dyskWorld/StoLat
Note that each of these commands is pretty quick, returning sub-second. Creating a ZFS file system is a cheap operation, and is expected to be done pretty regularly. You'll notice in Mac's Finder that the dyskWorld
mount now looks like a regular disk, and that inside, each of these shows up as a shared disk.
Now that we've got these separate file systems, what can we do with them? Well, time to introduce zfs properties. These are values that can be set on a filesystem as a whole, and are inherited by children. Let's look at the quota property to get a feel for how it's used:
apple[~] zfs set quota=10m dyskWorld/AnkhMorpork apple[~] zfs set quota=5m dyskWorld/Pseudopolis apple[~] zfs set quota=5m dyskWorld/Quirm apple[~] zfs set quota=5m dyskWorld/StoLat apple[~] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 279K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 22K 4.98M 22K /Volumes/dyskWorld/Quirm dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat
Even though the dyskWorld
partition has got a lot of space available, the file systems can be restricted and given quotas on a case-by-case basis. Let's put one to the test:
apple[~] mkfile 10m /Volumes/dyskWorld/Quirm/cheese mkfile: (/Volumes/dyskWorld/Quirm/cheese removed) Write Error: Disc quota exceeded
So far so good. We've got any number of file systems, and we can quota them all we like. If you've ever managed a Linux system and wondered about putting /var
and /usr
onto different partitions, then ZFS' answer is to create a separate file system for each and manage the file system. But the quota isn't the only thing we can control. We can also control whether compression is enabled for the file system, which means we can do the impossible:
apple[~] zfs set compression=on dyskWorld/Quirm apple[~] mkfile 10m /Volumes/dyskWorld/Quirm/cheese apple[~] ls -lh /Volumes/dyskWorld/Quirm/ total 1 -rw------- 1 me me 10M 2 Apr 01:43 cheese apple[Volumes] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 310K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 22K 4.98M 22K /Volumes/dyskWorld/Quirm dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat
Yes, we now have an empty 10M file in a file system that has a maximum quota of 5M, and in addition, isn't taking up any space. Of course, that's not really happening - but when you use 'mkfile
', it creates an empty file full of zeros, which is pretty easy to compress (try 'mkfile 10m /tmp/foo; gzip -9 /tmp/foo; ls -lh /tmp/foo.gz
').
The point is that compression can be enabled for the file system as a whole. Unlike Window's implementation (right-click a folder and select “Compress contents of this folder”), setting the compression=on
property doesn't actually compress everything that was there beforehand. It simply applies to newly written files. In fact, that's generally true of the ZFS properties on the whole; they don't change what's on disk, but affect subsequent operations (just like reducing the quota isn't going to get rid of any files that are there already).
Now, there's a bunch of stuff that falls into the 'compressible' category that you might have on disk. There's also a bunch of less compressible content. Photos and music are generally not well suited to compression, but text-based documentation (including web pages) generally are. In fact, on a Mac, /Documentation
and /Developer/Documentation
are pretty easily compressible, and are good candidates for hosting on a ZFS compressed partition.
The compression strategy is lzjb
by default, but you can also use gzip
instead if you'd prefer. For the smallest possible size, setting zfs set compression=gzip-9
will give you the biggest compression benefit for the data at the (potential) expense of the time it takes to compress it.
Now for something completely different. Let's say we've put together our dyskWorld
structure (including the cheese
) and we want to take a backup. No problem; in ZFS terms, these are called snapshots:
apple[~] zfs snapshot dyskWorld/Quirm@initial apple[~] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 314K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 23.5K 4.98M 23.5K /Volumes/dyskWorld/Quirm dyskWorld/Quirm@initial 0 - 23.5K - dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat
We've got our named snapshot (“initial” is the name under the dyskWorld/Quirm
file system), and it's currently taking up zero space. The reason it's taking up zero space is that it's currently sharing the same data as the file system; there haven't been any changes. We can simulate some data changes to do different things:
apple[Quirm] rm cheese apple[Quirm] mkfile 10k blue apple[Quirm] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 346K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 41.5K 4.96M 23.5K /Volumes/dyskWorld/Quirm dyskWorld/Quirm@initial 18K - 23.5K - dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat
Our initial snapshot now takes up more than it did before, even though nothing has changed, because the file we removed (cheese
) has now gone from being owned by the dyskWorld/Quirm
parent to dyskWorld/Quirm@initial
instead. It also means that the data is still there should you want to see it; you should be able to do cd .zfs/Quirm@initial/
and browse the contents as they were at that time. At the present time though, this is one of the known issues that's being worked on.
What can we do with a snapshot? Well, we can mount it as a clone in the meantime which gives us a read-write copy of the snapshot (but without changing the snapshot's contents).
apple[~] zfs clone dyskWorld/Quirm@initial dyskWorld/QuirmInitial apple[~] ls -lh /Volumes/dyskWorld/Quirm total 1 -rw------- 1 me me 10K 2 Apr 02:12 blue apple[~] ls -lh /Volumes/dyskWorld/QuirmInitial/ total 1 -rw------- 1 me me 10M 2 Apr 01:43 cheese
When we're finishing recovering (or diffing) whatever files we wanted at the time, we can get rid of the newly cloned data:
apple[~] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 354K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 41.5K 4.96M 23.5K /Volumes/dyskWorld/Quirm dyskWorld/Quirm@initial 18K - 23.5K - dyskWorld/QuirmInitial 0 123M 23.5K /Volumes/dyskWorld/QuirmInitial dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat apple[~] zfs destroy dyskWorld/QuirmInitial apple[~] zfs list NAME USED AVAIL REFER MOUNTPOINT dyskWorld 348K 123M 71K /Volumes/dyskWorld dyskWorld/AnkhMorpork 22K 9.98M 22K /Volumes/dyskWorld/AnkhMorpork dyskWorld/Pseudopolis 22K 4.98M 22K /Volumes/dyskWorld/Pseudopolis dyskWorld/Quirm 41.5K 4.96M 23.5K /Volumes/dyskWorld/Quirm dyskWorld/Quirm@initial 18K - 23.5K - dyskWorld/StoLat 22K 4.98M 22K /Volumes/dyskWorld/StoLat
Note that the snapshot is always there, ready to go back to if we need it. Also, when we created the clone, we had a zero usage because we were sharing the part of the data with Quirm@initial
snapshot. Much like the way that Time Machine doesn't backup unchanged files, ZFS doesn't needlessly copy data; yet manages it so that if any data goes from one place, it's inherited by the other.
Lastly, we can rollback to a snapshot:
apple[~] zfs rollback dyskWorld/Quirm@initial
That's about it for this instalment. Next time, I'll look at the RAID characteristics of ZFS and how you might use a ZFS system yourself.