ansaurus

Question

How do I convert XML to nested objects.

Answer 1

+1 A:

How about this

http://evanjones.ca/software/simplexmlparse.html

Superdumbell 2009-01-06 23:00:31

Answer 2

A:

If googling around for a code-generator doesn't work, you could write your own that uses XML as input and outputs objects in your language of choice.

It's not terribly difficult, however the three step process of Parse XML, Generate Code, Compile/Execute Script does making debugging a bit harder.

Alan 2009-01-06 23:18:48

Answer 3

A:

There are three common XML parsers for python: xml.dom.minidom, elementree, and BeautifulSoup.

IMO, BeautifulSoup is by far the best.

http://www.crummy.com/software/BeautifulSoup/

2009-01-07 00:09:03

BeautifulSoup does not play well with XML - it has problem with empty tags `<element/>` - which is ok for HTML because those are not popular there

Nas Banov 2010-06-23 22:55:49

Answer 4

+3 A:

I've been recommending this more than once today, but try Beautiful Soup (easy_install BeautifulSoup).

from BeautifulSoup import BeautifulSoup

xml = """
<main>
    <object attr="name">content</object>
</main>
"""

soup = BeautifulSoup(xml)
# look in the main node for object's with attr=name, optionally look up attrs with regex
my_objects = soup.main.findAll("object", attrs={'attr':'name'})
for my_object in my_objects:
    # this will print a list of the contents of the tag
    print my_object.contents
    # if only text is inside the tag you can use this
    # print tag.string

Soviut 2009-01-07 00:15:27

the xml that you have quoted is not well formed - <object1 ....>.....</object>

JV 2009-01-07 00:22:45

+1 for beautiful soup - good stuff

Andrew Hare 2009-01-07 01:23:05

main.findAll need to be soup.findAll, but that helped a bit. Still not exactly what I wanted--but I think I may have an idea of how to get it to work. It's going to be used in external py files that will be interpretted by the app, so I can probably just remap them before execution.

Stephen Belanger 2009-01-07 01:31:21

I fixed the bugs in the code and updated the xml. I simply copied the original code giving the in the question.

Soviut 2009-01-07 01:54:02

BeautifulSoup (BeutifulStoneSoup) breaks with empty tags `<element />`, e.g. `<icon data="/ig/images/weather/partly_cloudy.gif"/>` - and those are aplenty in xml :(

Nas Banov 2010-06-23 22:53:47

Answer 5

A:

#@Stephen: 
#"can't hardcode the element names, so I need to collect them 
#at parse and use them somehow as the object names."

#I don't think thats possible. Instead you can do this. 
#this will help you getting any object with a required name.

import BeautifulSoup


class Coll(object):
    """A class which can hold your Foo clas objects 
    and retrieve them easily when you want
    abstracting the storage and retrieval logic
    """
    def __init__(self):
        self.foos={}        

    def add(self, fooobj):
        self.foos[fooobj.name]=fooobj

    def get(self, name):
        return self.foos[name]

class Foo(object):
    """The required class
    """
    def __init__(self, name, attr1=None, attr2=None):
        self.name=name
        self.attr1=attr1
        self.attr2=attr2

s="""<main>
         <object name="somename">
             <attr name="attr1">value1</attr>
             <attr name="attr2">value2</attr>
         </object>
         <object name="someothername">
             <attr name="attr1">value3</attr>
             <attr name="attr2">value4</attr>
         </object>
     </main>
"""

#

soup=BeautifulSoup.BeautifulSoup(s)


bars=Coll()
for each in soup.findAll('object'):
    bar=Foo(each['name'])
    attrs=each.findAll('attr')
    for attr in attrs:
        setattr(bar, attr['name'], attr.renderContents())
    bars.add(bar)


#retrieve objects by name
print bars.get('somename').__dict__

print '\n\n', bars.get('someothername').__dict__

output

{'attr2': 'value2', 'name': u'somename', 'attr1': 'value1'}


{'attr2': 'value4', 'name': u'someothername', 'attr1': 'value3'}

JV 2009-01-07 01:06:09

Answer 6

+3 A:

David Mertz's gnosis.xml.objectify would seem to do this for you. Documentation's a bit hard to come by, but there are a few IBM articles on it, including this one.

from gnosis.xml import objectify

xml = "<root><nodes><node>node 1</node><node>node 2</node></nodes></root>"
root = objectify.make_instance(xml)

print root.nodes.node[0].PCDATA # node 1
print root.nodes.node[1].PCDATA # node 2

Creating xml from objects in this way is a different matter, though.

Ryan Ginstrom 2009-01-07 01:21:22

That is EXACTLY what I was looking for. Now I can get back to coding the fun stuff!

Stephen Belanger 2009-01-07 02:19:30

Answer 7

A:

Just to add my bits though it's not about python.

In PHP, to transform any XML string or file into a network of nested objects and access the values in the native OO way, is to use SimpleXML.

kavoir.com 2009-01-07 01:23:21

Answer 8

+6 A:

It's worth to have a look at http://codespeak.net/lxml/objectify.html

>>> xml = """<main>
... <object1 attr="name">content</object1>
... <object1 attr="foo">contenbar</object1>
... <test>me</test>
... </main>"""

>>> from lxml import objectify

>>> main = objectify.fromstring(xml)

>>> main.object1[0]
'content'

>>> main.object1[1]
'contenbar'

>>> main.object1[0].get("attr")
'name'

>>> main.test
'me'

Or the other way around to build xml structures:

>>> item = objectify.Element("item")

>>> item.title = "Best of python"

>>> item.price = 17.98

>>> item.price.set("currency", "EUR")

>>> order = objectify.Element("order")

>>> order.append(item)

>>> order.item.quantity = 3

>>> order.price = sum(item.price * item.quantity
... for item in order.item)

>>> import lxml.etree

>>> print lxml.etree.tostring(order, pretty_print=True)
<order>
  <item>
    <title>Best of python</title>
    <price currency="EUR">17.98</price>
    <quantity>3</quantity>
  </item>
  <price>53.94</price>
</order>

Peter Hoffmann 2009-01-07 04:51:26

When I run your generation example using lxml version 2.2 beta1, my XML is full of type annotations ("<title py:pytype="str">..."). Is there a way to supress that?

Ryan Ginstrom 2009-01-07 23:14:48

you can use lxml.etree.cleanup_namespaces(order)

Peter Hoffmann 2009-01-08 12:03:12

You actually want to use both `lxml.objectify.deannotate(order)` and `lxml.etree.cleanup_namespaces(order)`.

Paul McMillan 2010-01-27 21:59:14

ansaurus

tags:

views:

answers:

How do I convert XML to nested objects.

related questions