Alex headshot

AlBlue’s Blog

Macs, Modularity and More

ZFS can't come soon enough

2007, mac, zfs

One of the new features in Mac OS X Leopard, when it comes out, will be a filing system called ZFS. If you've no idea what that means, there's a good presentation at OpenSolaris on ZFS that may well answer your questions, or there's the ever (un)reliable wikipedia entry on ZFS instead.

Unfortunately, Leopard has been delayed due to the iPhone (which I will be getting to replace my Pos990 as soon as I can in the UK). There's likely to be quite a few new features (such as Leopard Theatre for apps) – and a new Finder wouldn't go amiss, either.

But the biggest thing I'm looking forward to is support for ZFS. This is basically a RAID-like system (or full RAID, if that's your thing) which will allow either metadata or metadata+data to be duplicated across a single or multiple disks. Not only that, but because the data writing is done with copy-on-write, the old data is still on disk until the space needs to be reclaimed, which allows multiple virtual filesystems to be linked (much like hard links would work in an ordinary Linux file system) and made available over time. It's like the new time machine, but without as many stars and can be used on servers, not just clients. There's been some speculation that Time Machine may actually use ZFS' features, though more likely a similar concept has been grafted onto the now aging HFS+ that's the default on Mac systems.

Why is ZFS any better than normal RAID, or combinations of hard/software disks? Well, for one, the ZFS pool can be grown at any time. Need some more space? Stick on an extra disk. If you've got data writes, they get spanned over the new disk to take advantage of it without needing any replacement. Worried about data corruption? Every disk block that gets written out is checksummed to prevent silent data loss. Better, if there's another copy of the data elsewhere, then the data is automatically reverted on disk. Plus, like NTFS, different parts of the system have different properties, so one section can be compressed (e.g. /usr/share/doc) and another area can be duplicated (e.g. /var/spool/mail). Either way, you get benefits; but unlike current systems (both Mac and NTFS will stop whilst it compresses the entire thing) the change of property only affects newly written/saved files. (This can of course be performed if you want to by doing a subsequent copy in the file system.)

I'll definitely be moving my server's drives over to ZFS when it arrives. There's been some concerns about many writes over an NFS+ZFS connection, but that's only because ZFS will honour NFS's requests to flush; in general, this is unlikely to be an issue for me, and if I do need to do that regularly, I'll be able to do so on the server itself.

The only question is RAID – do we even need that any more, now that the data can be duplicated? Maybe; there's RAID-Z which I need to investigate more for this area.

Now, if only there was a great way of synchronising changes on my PowerBook and home server when I was out on the road, I'd be really happy :-)