on Aug 10th, 2009XML: Still No Silver Bullet
The XML format has done a lot in the last decade to reduce some of the pains of legacy formats and to encourage application interoperability. Having a standardized syntax and object model makes the development process a lot easier. But I still feel that there are some severe shortcomings when it comes to the general format itself and and the concrete implementation of XML dialects that I want to discuss in this blog post.
XML is a markup language
As the name Extensible Markup Language implies, the language is first and foremost a markup language. That means that the language annotates a body of text. So if you were to strip out all the markup from an XML document, it should still end up with legible and comprehensible. This is certainly true for (X)HTML and DocBook. It might be slightly harder to read and understand, but all the crucial information is right there in plain text. The markup just adds semantic or presentational meta-information, e.g. which bits of text are headers or quotes.
In many cases XML is misused as a general data format. This often means that there is no actual text (character data) whatsoever in the resulting files. Then why use a markup-language in the first place? The Eclipse plugin.xml format for example is a markup only format (save some extension points schemas).
Attribute and Element Dichotomy
Another gripe with XML is when to use attributes vs. elements. There are at least three options that at first glance seem equally plausible. For example:
Name as PCDATA
<customer> ACME Inc. </customer>
Name as extra element, PCDATA
<customer> <name>ACME Inc.</name> </customer>
Name as attribute
<customer name="ACME Inc." />I have seen these three styles mixed within the same document for almost exactly the same fields. Should a general purpose data interchange format really be that hard to get right, or at least consistent?
Awkward mapping of common constructs
Because xml implicitly is a tree structure, all structures involving cyclic references or multiple cross-references in general are not easily mapped to an XML compatible form. Object-oriented models can often contain back-references, or reference a single object from different places in the object graph. Although there is some support for such constructs in the form of ID/IDREF attributes, this is neither supported by all parsers nor even publicly widespread information.
Another common data type that is painful to describe in XML is the associative map. An example from the Eclipse plugin.xml:
<property name="aboutImage" value="eclipse_lg.gif" />Compare that to a simple properties file format:
aboutImage=eclipse_lg.gif
The third type of commonly found data that is hard to put into XML is relational data. That kind of data is traditionally found in RDBMS, CSV files and unfortunately spreadsheets. While it is fairly trivial to create a corresponding xml format, the result is often needlessly verbose and repetitive, since all elements usually have the same attributes.
Readability
By far the biggest gripe I have with xml on a day to day basis is that it is really hard to parse for me as a human. Between the angle brackets bunched up against the element names, the endless repetition and escaped entities it seems like this format was not really designed with a legibility in mind. Another reason could be related to my first point: In regular markup languages the tags only contribute a relatively small percentage of the overall content. The majority is plain text, so the tags are few and far between. In markup-only languages however, there’s a much higher density of markup elements. In my opinion this redundant repetition lowers the signal to noise ratio significantly, making these documents much harder to read.
These are the main problems that I currently see with XML. I concede that the common, extensible meta format was a huge step forward, and that for some problem domains it is a pretty good fit. I also realize that XML is gonna be here for quite a while, but I think it’s time to stop resting on these laurels and see how we can address these problems in the future.
Tomorrow I’ll be looking at the some of the alternatives that might be potential successors to XML.


XML is still good enough as an object model for persistence (as long as it is read and written by machines only). The problem starts trying to use it as a domain specific language (DSL). XML cannot do this as it has lot of “noise”. Ideal in this case would be inventing your own DSL and writing a parser/lexer for it or better use a Textual Modeling Framework like Xtext.
I agree that XML is over used and in the wrong situations, but I do think that the use in plugin.xml is a good use.
I highly recommend reading Elliot Rusty Harold’s Effective XML: http://www.cafeconleche.org/books/effectivexml/
One bad use of XML is in your example of property files. XML should in many ways be specific on the fields, and databinding tools that generate xml output or generate XML Schema from class files are horrible at creating effective xml dialects. XML needs a good top down design, not a bottom up design. Unfortunately the later happens more often than the former when XML and particularly web services are involved.
Modeling is a key to using XML effectively. The same reason you model classes is the same reason you need schemas and good xml practices.
I not agree about over used speaking.
If by the example, it’s bad, or over, XML self is bad or over. proof: XML text node. you cannot prevent text node coexist with other sub element node. BTW, the text node problem, I not think so many people known its nature, just like rockets or RT.
Regard to design, XML should not be a language, ALL NEED IS JUST A NOTATION.
As a language, xml cannot do anything, and all thing is done by other true computing language.
As a noation, xml do little thing. rdf, synmatic web, developed owned notation inner xml notation.
YAML, JSON, proved the need of true notations.
So, I highly recommend the project ON: http://on.dev.java.net
@qinxian: You can prevent your scenario by making sure your XML has a DTD or XML Schema to use to validate it. If it doesn’t validate or pass validation then there is a reason.
XML is just a way to represent data. As you said, an application has to interpret it.
Hi, @David?
1, validation not a problem in most case, but that the xml document client not same as the designer of the document, is a common problem. IMO, the truth is XML can represent a structure with many ways, and the power of xml is the source of problem. At meanwhile, it’s disaster for the world, which waste and wasting life resource of people.
2,OK, many of people should agree XML just a way to represent data.
Forward depth, the industy need a best way to do represent data. So the blog host say “XML still not silver bullet”. So what’s the best? IMO it should be exist, just need discoverd, but by invention.
Json looks to me as being less human readable than XML… does anyone share the same view?
those ] } , ; are just too confusing and difficult to the eye when the data becomes complex