ansaurus

Question

Python and libxml2: how to iterate in xml nodes with XPATH

Answer 1

A:

If it is possible to switch to lxml, here is one way it could be done:

import lxml.etree as le
root=le.XML(content)
result=[]
namespaces={'pre':'http://www.mysyte.com/foo'}
for record in root:
    id=record.xpath('pre:id',namespaces=namespaces)[0]
    name=record.xpath('pre:name',namespaces=namespaces)[0]
    result.append({'code':id.text,'name':name.text})
print(result)
# [{'code': 'first', 'name': 'john'}, {'code': 'second', 'name': 'mike'}, {'code': 'third', 'name': 'albert'}]

Building off of Dimitre Novatchev's XPath expression, you could do this:

id_name_nodes = iter(ctxt.xpathEval('/pre:records/pre:record/*[self::pre:id or self::pre:name]'))

ret_list=[]
for id,name in zip(id_name_nodes,id_name_nodes):
    ret_list.append({'code':id.content,'name':name.content})
print(ret_list)

This libxml2 code, relies on every record having an id and name. If an id or name is missing, the ret_list will pair the wrong id and name, failing silently. Under the same circumstance, the lxml code would raise an error.

unutbu 2010-07-29 19:25:53

I'm using libxml2 everywhere and I would like to keep using it also in this case.However thanks for your answer!

Giovanni Di Milia 2010-07-29 20:01:42

Tim McNamara 2010-07-29 22:18:49

ok, but there should be a way to do it directly in libxml2!

Giovanni Di Milia 2010-07-30 19:27:02

Answer 2

A:

You can select all the elements you need with a single XPath expression:

/pre:records/pre:record/*[self::pre:id or self::pre:name]

Then just process the selected nodes in python.

Dimitre Novatchev 2010-07-30 13:05:47

Sorry but this doesn't answer my question

Giovanni Di Milia 2010-07-30 19:26:27

@Giovanni-Di-Milia: This answers the XPath part -- I don't know Python. Having selected all nodes you want, you should be able to process them in Python and to produce the wanted result.

Dimitre Novatchev 2010-07-30 19:39:12

Answer 3

+1 A:

Here is a suggestion. Note the setContextNode() method:

import libxml2

xml = "test.xml"
doc = libxml2.parseFile(xml) 

ctxt = doc.xpathNewContext() 
ctxt.xpathRegisterNs("pre","http://www.mysyte.com/foo") 

ret_list = []
record_nodes = ctxt.xpathEval('/pre:records/pre:record') 

for node in record_nodes:
    ctxt.setContextNode(node)
    _id = ctxt.xpathEval('pre:id')[0].content
    name = ctxt.xpathEval('pre:name')[0].content
    ret_list.append({'code': _id, 'name': name}) 

print ret_list

mzjn 2010-07-31 20:34:06

No comments on this one? It is indeed a way to "do it directly in libxml2".

mzjn 2010-08-11 17:59:07

Sorry! I forgot to sign this answer as the best one! It actually works in the way I want. Thanks!

Giovanni Di Milia 2010-10-19 14:52:42

ansaurus

tags:

views:

answers:

Python and libxml2: how to iterate in xml nodes with XPATH

related questions