views:

82

answers:

2

I'm trying to create a simple XML parser where each different XML schema has it's own parser class but I can't figure out what the best way is. What I in effect would like to do is something like this:

in = sys.stdin
xmldoc = minidom.parse(in).documentElement

xmlParser = xmldoc.nodeName
parser = xmlParser()
out = parser.parse(xmldoc)

I'm not also quite sure if I get the document root name correctly, but that's the idea: create an object of a class with similar name to the document root and use the parse() function in that class to parse and handle the input.

What would be the simplest way to achieve this? I've been reading about introspection and templates but haven't been able to figure this out yet. I've done a similar thing with Java in the past and AFAIK, Ruby also makes this simple. What's the pythonian way?

+1  A: 

I think most python programmers would just use lxml to parse their xml. If you still want to wrap that in classes you could, but as delnan said in his comment, it's a bit unclear what you really mean.

from lxml import etree

tree = etree.parse('my_doc.xml')
for element in tree.getroot():
    ...

A couple of side notes, if other programmers are going to be reading your code, you should try to at least roughly follow PEP 8. More importantly though, you really shouldn't assign to builtins like "in."

Mark
This is just a simple test server where this script receives an XML file and returns something. I thought I'd make it a bit more clever so that it's easy to add more tests to the xml received (validity checking etc) per schema (i.e. I could just check that the xml file is correct). My plan was to have the parsers named after the root document, but this is beyond the point as I was more interested in the reflection/introspection part of my question. I.e. is it possible to create an object if we have the object's name as a string?
Makis
Well, it's simple to instantiate an existing class if you know it's name. You can just use parser_class = getattr(module, class_name). I think this is what you are asking for. If you want to dynamically generate a class based on a string name, you can actually do that to, but I don't think that's what you want.
Mark
+1  A: 

As pointed out by Mark in his comment, to get a reference to a class that you know the name of at runtime, you use getattr.

doc = minidom.parse(sys.stdin)
# is equivalent to
doc = getattr(minidom, "parse")(sys.stdin)

Below is a corrected version of your pseudo-code.

from xml.dom import minidom
import sys
import myParsers # a module containing your parsers

xmldoc = minidom.parse(sys.stdin).documentElement

myParserName = xmldoc.nodeName
myParserClass = getattr(myParsers, myParserName)
# create an instance of myParserClass by calling it with the documentElement
parser = myParserClass(xmldoc)
# do whatever you want with the instance of your parser class
output = parser.generateOutput()

getattr will return an AttributeError if the attribute doesn't exist, so you can wrap the call in a try...except or pass a third argument to getattr, wich will be returned if the attribute isn't found.

Dive Into Python has a good explanation: http://diveintopython.org/power_of_introspection/getattr.html

BudgieInWA