views:

41

answers:

2

I need to parse XML document and then write every node to separate files keeping exact order of attributes. So if i have input file like :

<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>

Output should be 2 files with every item. Now if xml.dom.minidom is used - attribute order is changed in output( i can get - <item b="b" c="c" **a="a"**/>)

I found pxdom lib, it keeps order but very-very slow( minidom parsing takes 0.08 sec., pxdom parsing takes 2,5 sec.)

Is there any other python libraries that can keep attributes?

UPD: libarry should also keep upper and lower cases. So "Item" is not equal to "item"

A: 

You can use BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup as soup

>>> html = '''<item a="a" b="b" c="c"/>
<item a="a1" b="b2" c="c3"/>'''
>>> s = soup(html)
>>> s.findAll('item')
[<item a="a" b="b" c="c"></item>, <item a="a1" b="b2" c="c3"></item>]
rubik
unfortunately BeautifulSoup changes all nodes to the lower case. and it seems that beatifulsoup cannot be case sensetive
Andrew
Ah you're right!
rubik
A: 

You might find this question useful. Bottom line summary-- standard xml tools and libraries most likely won't be able to do this.

ma3
thanks, i saw that Q, pxdom does it but very very slow. in general problem is to find some library that uses list(instead of dict) as a storage for attrs
Andrew
A library that does this would have to store both a dict and a list, for both the mapping and the order. Or possibly an OrderedDict. I tried this scenario with `lxml` before posting this answer, and no matter how many attributes I added, the keys *were* always in the order as listed in the xml file. But I have no idea if that is guaranteed.
ma3