tags:

views:

60

answers:

2

I'd like to store some relatively simple stuff in XML in a cascading manner. The idea is that a build can have a number of parameter sets and the Python scripts creates the necessary build artifacts (*.h etc) by reading these sets and if two sets have the same parameter, the latter one replaces the former.

There are (at least) two differing ways of doing the XML:

First way:

<Variants>
<Variant name="foo" Info="foobar">1</Variant
</Variants>

Second way:

<Variants>
<Variant>
<Name>Foo</Name>
<Value>1</Value>
<Info>foobar</Info>
</Variant>
</Variants>

Which one is better easier to handle in ElementTree. My limited understanding claims it would be the first one as I could search the variant with find() easily and receive the entire subtree but would it be just as easy to do it with the second style? My colleague says that the latter XML is better as it allows expanding the XML more easily (and he is obviously right) but I don't see the expandability a major factor at the moment (might very well be we will never need it).

EDIT: I could of course use lxml as well, does it matter in this case? Speed really isn't an issue, the files are relatively small.

+2  A: 

You're both right, but I would pick #1 where possible, except for the text content:

  • #1 is much more succinct and human-readable, thus less error-prone.
  • Complete extensibility: YAGNI. YAGNI is not always true but if you're confident that you won't need extensibility, don't sacrifice other benefits for the sake of extensibility.
  • #1 is still pretty extensible. You can always add more attributes or child elements. The only way it isn't extensible is if you later discover you need multiple values for name, or info (or the text content value)... since you can't have multiple attributes with the same name on an element (nor multiple text content nodes without something in between). However you can still extend those by various techniques, e.g. space-separated values in an attribute, or adding child elements as an alternative to an attribute.
  • I would make the "value" into an attribute or a child element rather than using the text content. If you ever have to add a child element, and you have that text content there, you will end up with mixed content (text as a sibling of an element), which gets messy to process.

Update: further reading

A few good articles on the XML elements-vs-attributes debate, including when to use each:

See also this SO question (but I think the above give more profitable reading).

LarsH
Good answer, and if you could tell me if there is a difference in handling those in lxml or not I'd gladly accept your answer.
Makis
@Makis, sorry, I don't know anything about lxml. I'm just speaking from an XML perspective.
LarsH
A: 

Remember the critical limitations on XML attributes:

  • Attribute names must be XML names.
  • An element can have only one attribute with a given name.
  • The ordering of attributes is not significant.

In other words, attributes represent key/value pairs. If you can represent it in Python as a dictionary whose keys are XML names and whose values are strings, you can represent it in XML as a set of attributes, no matter what "it" is.

If you can't - if, for instance, ordering is significant, or you need a value to include child elements - then you shouldn't use attributes.

Robert Rossney