Was XML Flawed from the Start? (2005)

sven-n · on March 7, 2014

I guess the author didn't get the purpose of XML. It doesn't have the goal to use the least amount of metadata/tags as possible. If you need that, just compress it. Whatever, XML offers a wide range of tools, such as:

- Data in XML can be transformed into almost any other format using XSLT.

- It can be validated using XSD.

- It can be searched using XPath.

This makes it a perfect data exchange format.

chinpokomon · on March 7, 2014

Well formed and validated, two distinct concepts. Although it is sometimes deemed a negative, having attributes as well as nodes can offer subtleties that are unmatched in other data exchange formats. While I appreciate the usefulness of JSON, XML and its siblings XPath, XSLT, etc. have their own strengths but taking advantage of them is requires thinking differently.

andrewflnr · on March 7, 2014

Perfect? Really? There's no possible better way to achieve those goals? I've used XSLT; it's nasty.

sven-n · on March 7, 2014

Well, I don't say there are not better ways, but I don't know any. I've used XSLT a lot. It requires different thinking, if you are used to procedural programming. If you deeply understand how it works, it's pretty straightforward.

andrewflnr · on March 8, 2014

The concepts are straight-forward, but the syntax is absurd. I really liked it at first because it's functionally oriented, which is exactly what you want for transforming data. But it's so, so verbose because it's still trying to be markup instead of a programming language.

chinpokomon · on March 8, 2014

You nailed it. XSLT is not a programming language, it is a data transformation language. Furthermore, it is a data transformation language written as a valid and well formed XML document.

I was recently involved in an exercise to create a JavaScript runtime environment running in Java, and wrote JavaScript scripts to process JSON payloads that transformed them into a set of instructions that would change network switch configurations. The process that kicked off the transformations initially generated XML payloads, but it had been modified to generate JSON for the exercise... or more accurately that is what it would do, but we were in the POC phase.

The difficulty in using JSON was that the payload wasn't flat but represented a system of woven objects, as was true of the original XML payload. The XSLT transformer while cumbersome to write was able to transform the payload using backtracking XPath references that was significantly more difficult to articulate when using JavaScript. In the end my POC method worked and it was easier for someone with a procedural programming mindset to read and write, but it lost some of the nuance of the XSLT method.

Use the right tool for the job. I personally thought the best approach would have been a combination. Use a lighter weight XSLT to transform the objects into a more manageable payload and then use the JavaScript to handle the last mile tasks it was better suited to do. It was an interesting exercise and maybe I'll be able to leverage what I learned on a future project.

davidgerard · on March 8, 2014

>XSLT is not a programming language, it is a data transformation language. Furthermore, it is a data transformation language written as a valid and well formed XML document.

The hazard of Turing completeness is that once you have it, you will have to use it.

We have long learnt this at work. (Our attitude is that "XML is like violence, when in doubt do more of it.") If the transformation step looks like the appropriate place for the complexity required, some poor bugger will be tasked with implementing Towers of Hanoi or whatever in it.

XSLT is a programming language. It's just an unspeakably horrible one.

andrewflnr · on March 8, 2014

What do you mean, the original XML payload wasn't flat? At first glance XML is just as inherently flat as JSON (being a text string), but I guess there's some internal referencing semantics for XML documents? What did you really lose moving away from XSLT?

Anyway, I think I'd rather have a nice programming language that's also well-suited to data transformation. I'd ditch XML for something more elegent and flexible and have a transformation language to match. That's my dream, anyway.

chinpokomon · on March 9, 2014

What I meant was that it was a collection of different, but related objects. I had nothing to do with creating the structure and content of the payload, so I was stuck using it. The XSLT had grown organically over several years to solve each immediate problem and didn't seem to be designed. This wasn't a problem for XPath because it could walk backwards on the RTF and index based on keys contained elsewhere in the payload. I'm sure it didn't perform as well as some alternatives, but it is still a powerful and unambiguous language in that respect.

Another subtlety was their use of attributes in the XML payload that were simply interned as any other property in the JSON representation. I did some hand waving earlier by suggesting the original system was configured to emit JSON. That would eventually be the case, but for the POC it was transformed by an intermediate process that did a straight transliteration. This made the JSON a little less friendly because I had arrays of objects where they would have been better described as a dictionary, but I had to be more generic in that approach.

Once I was in the transformer portion of JS code, I converted the arrays into something more usable. This still meant nesting multiple loops as I iterated over the collections and built usable objects.

There was some attempt by the original programmer to use XSLT best practices. The transformation was broken up into multiple templates that were then applied, but there was too much reliance on conditional variables for my taste.

Arguably had the JSON payload been rewritten and structured better, the JS approach would have worked. But this was building a plane in mid-flight. I didn't have that luxury. I had to ingest JSONified XML and emit a text document.

To make the template easier to read in the source I defined it using Moustache. This was probably the biggest issue with the existing system. With all of the conditionals, it was next to impossible to know what the final transformation would look like. This is why the client was looking for other options, because the existing system was becoming too costly to maintain.

I think this is what I was trying to get at in my original comment. XSLT worked better as a data transformer. XPath made it easier to describe these transformations than I could do in JS alone. On the other hand, the JS was easier to maintain and if from your background you only know procedural languages, it was easier to read and write. The problem in this case was that the data they were using to drive the process was already in an XML format and XSLT was a known working solution.

I still think the best solution for this case would be rewriting the backend, but that wasn't an option. The next best solution would be to use XSLT to transform the data into something more manageable in JS, but the client wanted to eliminate XSLT entirely. My POC was the end result, for better or for worse. It was still an interesting project.

Turing_Machine · on March 7, 2014

Much discussion of similar issues here:

http://c2.com/cgi/wiki?XmlIsaPoorCopyOfEssExpressions

pan69 · on March 7, 2014

XML works very well as a serialization format. Especially when you apply schemas that are able to validate your XML structure. I agree, typing XML can be cumbersome but so can JSON (which can be quite nightmarish in larger blobs and which doesn't allow comments).