tags:

views:

314

answers:

2

I am trying to write a function that will take an xml object, an arbitrary number of tags, defined by tuples containing a tag name, attribute and attribute value (e.g ('tag1', 'id', '1')) and return the most specific node possible. My code is below:

from xml.dom import minidom

def _search(object, *pargs):
    if len(pargs) == 0:
        print "length of pargs was zero"
        return object
    else:
        print "length of pargs is %s" % len(pargs)
    if pargs[0][1]:
        for element in object.getElementsByTagName(pargs[0][0]):
            if element.attributes[pargs[0][1]].value == pargs[0][2]:
                _search(element, *pargs[1:])
    else:
        if object.getElementsByTagName(pargs[0][0]) == 1:
            _search(element, *pargs[1:])
def main():
    xmldoc = minidom.parse('./example.xml')
    tag1 = ('catalog_item', 'gender', "Men's")
    tag2 = ('size', 'description', 'Large')
    tag3 = ('color_swatch', '', '')

    args = (tag1, tag2, tag3)
    node = _search(xmldoc, *args)
    node.toxml()
if __name__ == "__main__":
    main()

Unfortunately, this doesn't seem to work. Here's the output when I run the script:

$ ./secondsearch.py
length of pargs is 3
length of pargs is 2
length of pargs is 1
Traceback (most recent call last):
  File "./secondsearch.py", line 35, in <module>
    main()
  File "./secondsearch.py", line 32, in main
    node.toxml()
AttributeError: 'NoneType' object has no attribute 'toxml'

Why isn't the 'if len(pargs) == 0' clause being exercised? If I do manage to get the xml object returned to my main method, can I then pass the object into some other function (which could change the value of the node, or append a child node, etc.)?

Background: Using python to automate testing processes, environment is is cygwin on winxp/vista/7, python version is 2.5.2. I would prefer to stay within the standard library if at all possible.

Here's the working code:

def _search(object, *pargs):
    if len(pargs) == 0:
        print "length of pargs was zero"
    else:
        print "length of pargs is %s" % len(pargs)
    for element in object.getElementsByTagName(pargs[0][0]):
        if pargs[0][1]:
            if element.attributes[pargs[0][1]].value == pargs[0][2]:
                return _search(element, *pargs[1:])
        else:
            if object.getElementsByTagName(pargs[0][0]) == 1:
                return _search(element, *pargs[1:])
    return object
+2  A: 

Shouldn't you be inserting a return in front of your recursive calls to _search? The way you have it now, some exit paths from _search don't have a return statement, so they will return None - which leads to the exception you're seeing.

Vinay Sajip
I'm not sure what you mean by 'inserting a return in front of your recursive calls', specifically 'in front of'. Could you explain that more fully?
Rob Carr
Yes - your statement should bereturn _search(element, *pargs[1:])rather than just_search(element, *pargs[1:])
Vinay Sajip
And don't forget that Stobor makes good points, too.
Vinay Sajip
Thanks for the clarification. I'm giving Stobor the solution, but I did up your score to reflect your significant assistance. Again, thanks.
Rob Carr
+2  A: 

I assume you're using http://www.eggheadcafe.com/community/aspnet/17/10084853/xml-viewer.aspx as your sample data...

As Vinay pointed out, you don't return anything from your recursive calls to _search.

In your else case, you don't define the value of element, but you pass it into the _search().

Also, you don't do anything if pargs[0][1] is empty, but object.getElementsByTagName(pargs[0][0]) returns more than one Node... (which is also why your pargs == 0 case never gets hit...)

And after all that, if that sample data is correct, there are two matching nodes. so you'll have a NodeList containing:

        <color_swatch image="red_cardigan.jpg">Red</color_swatch>
        <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>

and you can't call .toxml() on a NodeList...

Stobor
Thanks. You made several valid points, all of which I kept in mind as I rewrote the function. The only one not represented in the code itself is the section dealing with a NodeList being returned - that got a comment in my script reminding me to include enough information to obtain a specific node.
Rob Carr