Beyond XML: The Future of Extensible Metaformats

August 11, 2009 | 4 min Read

Yesterday I discussed some of the issues with XML. Today I’ll be taking a look at three of the potential alternatives that may improve on the current situation.

YAML

YAML Ain’t Markup Language. To quote the spec, it is a “human-friendly, cross language, Unicode based data serialization language”. While mainly designed with so-called agile languages in mind, it can also be used with more traditional languages. The format is more data-centric than document-driven. Even so, one of its primary design goals is good readability. It was heavily influenced by RFC 2822 (Internet Message Format). That means it looks a lot like mail headers. It features built-in support for lists, hashes (i.e. dictionaries or associative array), and common data types. It also allows elements to have multiple parents, which also allows cross-references. There are some more minor features that make working with the format easier. Here’s a small example of a YAML document lifted from the spec:

invoice: 34843 date : 2001-01-23 bill-to: &id001 given : Chris family : Dumars address: lines: | 458 Walkman Dr. Suite #292 city : Royal Oak state : MI postal : 48046 ship-to: *id001 product: - sku : BL394D quantity : 4 description : Basketball price : 450.00 - sku : BL4438H quantity : 1 description : Super Hoop price : 2392.00 tax : 251.42 total: 4443.52 comments: Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.

While YAML seems like a solid solution with implementations written for many languages, there are some issues, like the surprising lack of momentum in the software development community and the controversial use of significant whitespace.

JSON

The JavaScript Object Notation is basically a subset of JavaScript used to statically describe data. With the release of YAML 1.2 it is also a subset of YAML, which means every JSON document is a YAML file. Its focus is primarily simplicity and readability. While it is trivial to parse JSON in a JavaScript environment, its simplicity makes it also quite easy parse in other languages. Parsers for most popular modern development platforms exist. Being so easily accessible in browsers has earned it quite some support and momentum in modern web development, already often completely replacing XML in the AJAX stack. Here’s a small snippet lifted from the JSON Wikipedia page:

{
     "firstName": "John",
     "lastName": "Smith",
     "address": {
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": 10021
     },
     "phoneNumbers": {
         "home": "212 555-1234",
         "fax": "646 555-4567"
     }
}

JSON is currently quite popular, though it may in certain cases be hampered by its simpleness. Maybe YAML forward compatibility can provide an convenient upgrade path, should a more sophisiticated format be necessary.

Groovy

Most of you may know Groovy as a JVM scripting language. I have also already blogged about using Groovy to replace especially painful parts of XML.

Groovy features a MarkupBuilder that let’s you create what is basically an XML DOM tree in memory using a slightly more fluent syntax. Have a look:

  car(name:'HSV Maloo', make:'Holden', year:2006) {
    country('Australia')
    record(type:'speed', 'Production Pickup Truck with speed of 271kph')
  }

Some might consider this just syntactic sugar but it really goes a long way.

But I think Groovy can also be used as a first class configuration language. There are already several projects out there that use groovy scripts for tasks that have traditionally been in the firm grip of XML. One such example is gant, which is basically just a thin wrapper to write ant files. Of course nowadays, everyone is using maven instead, but there’s also a neat tool for that: Gradle is build system configured using a Groovy DSL, while employing Apache ivy and maven under the hood. Take a look at this example from the official gradle documentation:

usePlugin 'java'

sourceCompatibility = 1.5
version = '1.0'
manifest.mainAttributes(
    'Implementation-Title': 'Gradle Quickstart',
    'Implementation-Version': version
)

repositories {
    mavenCentral()
}

dependencies {
    compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
    testCompile group: 'junit', name: 'junit', version: '4.+'
}

test {
    options.systemProperties['property'] = 'value'
}

uploadArchives {
    repositories {
       flatDir(dirs: file('repos'))
    }
}

A lot easier on the eyes, while still providing interoperability with “legacy” systems like maven.

The power and expressiveness of groovy make it really easy to create domain-specific languages like these. Such a special purpose language might not always be desirable, especially when interoperability is a key concern. But for certain applications these DSLs might be a better solution than any general purpose format.

On the whole, these are interesting times we live in and I’m curious to see what the future holds in store. As I said before, XML is probably gonna be here for quite a while, but there are some compelling alternatives out there. What are your thoughts on the legacy of XML?