Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Good uses of XML

2005, xml

I think that this post sums up usage of XML nicely. Specifically; it's a data format, not a serialised object format. Java serialization was initially intended to provide a format for storing data in during JVM restarts (e.g. starting an application or rebooting an OS), but failed dismally because it was so tightly coupled to the object structure. Instead, it found its niche as an automatical marshalling mechanism for RMI commuincation, which helped enormously in the early days. And, despite being touted as "used for lightweight persistence ... and supports the evolution of the classes" it really fails as soon as the class structure changes. To be honest, this shouldn't be a surprise to anyone; if you change (for example) data stored as an ArrayList into a LinkedList, they're going to have very different structural requirements and so the serialized object is going to change.

Where XML succeds is where there's a defined format for structuring the data in the first place. "XML is a data format. It is NOT a serialization of programming language structures so don't treat it as such." Specifically, design the XML around the data, not the objects. This is why the XMLEncoder is designed to fail; it tightly couples the data structure with the object structure. The result? The data changes, and the object fails. Even accepting that you can abstract list implementations away with the same XML data, there are some structure (like refactoring out a set of common features like year-month-day into a single Date object instead of a single string) that aren't going to be compatible.

When you've defined the XML structure up front, and then written your tools to process that data structure, it's much more likely you've got a structure that's efficient for the data that you need. For example, if you're designing data that requires a date/time combo, hopefully you're going to realise that you need to separate out the year/month/day components as attributes, and possibly even their own element. You can then write code to process that data afterwards.

The post also brings up another blindingly obvious but often unappreciated statement: "Use XML tools as much as possible. XPath and XSL-T are enormously powerful tools for working with XML." XPath should really be used for all places where XML data is being used for reading purposes. That way, if you ever need to evolve your XML data structure, it's a case of redefining some XPath expressions. The only downside to using XPath is that it typically operates best on an in-memory model, and that tends to limit its use with large documents.

In short, a great summary and a must-read for developers wishing to use XML as a data structure.