I recently upgraded my server to 10.4.9, seeing as it had been out for a few weeks and thought that it would have had the problems ironed out, especially seeing as how it had been a while since 10.4.8 was released. After the install, everything seemed to go smoothly, and my internal network was functioning normally so I considered it a success and forgot about it.
My wife subsequently remarked that she couldn't see our external website from inside our network, which was remarkably odd, seeing as I'd been able to see it externally before. In fact, checking with a few other servers, the website (which is hosted externally to my network) appeared to be visible, but internally it couldn't be found.
It transpired that the client wasn't able to resolve any of the DNS names in my external domain. My internal domain (which is of course separate, both in DNS namespace and also in IP address range) continued to work fine. This was especially odd, since my DNS server is the authoritative reference for both the internal and external namespaces, although I republish that DNS zone via a hidden server to an external service provider for resilience.
And then it hit me. The only reason that my external DNS was still working was because the republishers had a cached zone, and that my external zone on my DNS server had completely stopped working. Had I not picked it up in time, the zone refresh rates would have meant that my DNS domain would have completely disappeared off the face of the internet.
Some debugging with the nameserver later (
named -g -d 1 -p 1234 is your friend) it turns out that some entries in the zone were being barfed upon (
check-names failure) and thus instead of just rejecting those entries, the entire zone was being dropped. All without any kind of warning in a non-debug mode, nothing to
Anyway, it turns out that it's essentially
BIND9's fault; in the transition the decision was made to make
check-names the default. Bad move. Whenever you're making a minor release, you shouldn't change settings. So, whereas before the
check-names was disabled in 10.4.8, it is enabled in 10.4.9.
Anyway, googling for the problem revealed other people having this problem with BIND – though not Mac specific – and it turned out they were having problems with underscores in names. I didn't have any of those, but I had commented out a couple of older aliases with # in the zone file (and I could tell they were commented out because when re-starting and looking up the entry didn't find it ...) but lo and behold, the # isn't the comment character for BIND zones. Weird. I've been using this since 1997 and it's never yet complained; and then suddenly, it throws a wobbly and pretends the zone isn't there any more.
The end result is that removing the entries with # on them and the zone mysteriously appeared again, and the public secondaries picked up the new version of the zone file. But had my wife not noticed that the external website wasn't working internally, my domain could have been wiped off the internet.