views:

102

answers:

3

Hello.

I am going to handle XML files for a project. I had earlier decided to use lxml but after reading the requirements, I think ElemenTree would be better for my purpose.

The XML files that have to be processed are:

  1. Small in size. Typically < 10 KB.

  2. No namespaces.

  3. Simple XML structure.

Given the small XML size, memory is not an issue. My only concern is fast parsing.

What should I go with? Mostly I have seen people recommend lxml, but given my parsing requirements, do I really stand to benefit from it or would ElementTree serve my purpose better?

A: 

lxml is basically a superset of ElementTree so you could start with ElementTree and then if you have performance or functionality issues then you could change to lxml.

Performance issues can only be studied by you using your own data,

Mark
+1  A: 

As others have pointed out, lxml implements the ElementTree API, so you're safe starting out with ElementTree and migrating to lxml if you need better performance or more advanced features.

The big advantage of using ElementTree, if it meets your needs, is that as of Python 2.5 it is part of the Python standard library, which cuts down on external dependencies and the (possible) headache of dealing with compiling/installing C modules.

Will McCutchen
A: 

I recommend my own recipe

XML to Python data structure « Python recipes « ActiveState Code

It does not speed up parsing. But it provides a really native object style access.

>>> SAMPLE_XML = """<?xml version="1.0" encoding="UTF-8"?>
... <address_book>
...   <person gender='m'>
...     <name>fred</name>
...     <phone type='home'>54321</phone>
...     <phone type='cell'>12345</phone>
...     <note>&quot;A<!-- comment --><![CDATA[ <note>]]>&quot;</note>
...   </person>
... </address_book>
... """
>>> address_book = xml2obj(SAMPLE_XML)
>>> person = address_book.person


person.gender        -> 'm'     # an attribute
person['gender']     -> 'm'     # alternative dictionary syntax
person.name          -> 'fred'  # shortcut to a text node
person.phone[0].type -> 'home'  # multiple elements becomes an list
person.phone[0].data -> '54321' # use .data to get the text value
str(person.phone[0]) -> '54321' # alternative syntax for the text value
person[0]            -> person  # if there are only one <person>, it can still
                                # be used as if it is a list of 1 element.
'address' in person  -> False   # test for existence of an attr or child
person.address       -> None    # non-exist element returns None
bool(person.address) -> False   # has any 'address' data (attr, child or text)
person.note          -> '"A <note>"'
Wai Yip Tung