views:

70

answers:

4

This question may be seen as subjective, but I'd like to ask SO users which common structured textual data format is best supported in Python.

My initial choices are:

  • XML
  • JSON
  • and YAML

Which of these three is easiest to work with in Python (ie. has the best library support / performance) ... or is there another format that I haven't mentioned that is better supported in Python.

I cannot use a Python only format (e.g. Pickling) since interop is quite important, but the majority of the code that handles these files will be written in Python, so I'm keen to use a format that has the strongest support in Python.

CSV or fixed column text may also be viable for most use cases, however I'd prefer the flexibility of a more scalable format.

Thank you

Note

Regarding interop I will be generating these files initially from Ruby, using Builder, however Ruby will not be consuming these files again.

A: 

It's pretty much all the same, out of those three. Use whichever is easier to inter-operate with.

TokenMacGuy
+3  A: 

I would go with JSON, I mean YAML is awesome but interop with it is not that great.
XML is just an ugly mess to look at and has too much fat.

Python has a built-in JSON module since version 2.6.

NullUserException
Is there a particular JSON library you'd recommend or should I just use - import json on the standard library
slomojo
@slomojo The standard library works fine; I am not aware of issues with it.
NullUserException
@NullUserException Thank you
slomojo
+3  A: 

JSON has great python support and it is much more compact than XML (and the API is generally more convenient if you're just trying to dump and load objects). There's no out of the box support for YAML that I know of, although I haven't really checked. In the abstract I would suggest using JSON due to the low overhead of the format and the wide range of language support, but it does depend a bit on your application - if you're working in a space that already has established applications, the formats they use might be preferable, even if they're technically deficient.

Nick Bastin
+1  A: 

I think it depends a lot on what you need to do with the data. If you're going to be building a complex database and doing processing and transformations on it, I suspect you'd be better off with XML. I've found the lxml module pretty useful in this regard. It has full support for standards like xpath and xslt, and this support is implemented in native code so you'll get good performance.

But if you're doing something more simple, then likely you'd be better off to use a simpler format like yaml or json. I've heard tell of "json transforms" but don't know how mature the technology is or how developed Python's access to it is.

intuited