views:

84

answers:

3

there is an info.xml file under every /var/packs/{many folders}/info.xml where are different directories but with the dirs's info in info.xml

I need to parse through every {many folders} and create a list of the filepath which is inside the Path tags if the file type is "config" which can be found by checking if "config" is the type inside the type tags.

The info.xml file wil be like this,

<Files>
    <File>
        <Path>usr/share/doc/dialog/samples/form1</Path>
        <Type>doc</Type>
        <Size>1222</Size>
        <Uid>0</Uid>
        <Gid>0</Gid>
        <Mode>0755</Mode>
        <Hash>49744d73e8667d0e353923c0241891d46ebb9032</Hash>
    </File>
    <File>
        <Path>usr/share/doc/dialog/samples/form3</Path>
        <Type>config</Type>
        <Size>1294</Size>
        <Uid>0</Uid>
        <Gid>0</Gid>
        <Mode>0755</Mode>
        <Hash>f30277f73e468232c59a526baf3a5ce49519b959</Hash>
    </File>
</Files>
+2  A: 

Here is very basic example with no errors processing and works with very strictly defined XML files, but you should take it as a start and continue with the following links:

The code:

import os
import os.path
from xml.dom.minidom import parse


def parse_file(path):
    files = []
    try:
        dom = parse(path)
        for filetag in dom.getElementsByTagName('File'):
            type = filetag.getElementsByTagName('Type')[0].firstChild.data
            if type == 'config':
                path = tag.getElementsByTagName('Path')[0].firstChild.data
                files.append(path)
        dom.unlink()
    except:
        raise
    return files


def main():
    files = []
    for root, dirs, files in os.walk('/var/packs'):
        if 'info.xml' in files:
            files += parse_file(os.path.join(root, 'info.xml'))
    print 'The list of desired files:', files


if __name__ == '__main__':
    main()  
d.m
A: 

Writing this off the top of my head, but here goes. We're going to make use of os.path.walk to recursively descend into your directories and minidom for doing the parsing.

import os
from xml.dom import minidom

# opens a given info.xml file and prints out "Path"'s contents
def parseInfoXML(filename):
    doc = minidom.parse(filename)
    for fileNode in doc.getElementsByTagName("File"):
        # warning: we assume the existence of a Path node, and that it contains a Text node
        print fileNode.getElementsByTagName("Path")[0].childNodes[0].data
    doc.unlink()

def checkDirForInfoXML(arg, dirname, names):
    if "info.xml" in names:
        parseInfoXML(os.path.join(dirname, "info.xml"))

# recursively walk the directory tree, calling our visitor function to check for info.xml in each dir
# this will include packs as well, so be sure that there's no info.xml in there
os.path.walk("/var/packs" , checkDirForInfoXML, None)

Not the most efficient way to accomplish it I'm sure, but it'll do if you don't expect any errors/whatever.

Faisal
Just a side note: os.path.walk is deprecated and has been removed in 3.0 in favor of os.walk(). http://docs.python.org/library/os.path.html#os.path.walk
d.m
Aha, thank you. I'm unfortunately still living in the Python 2.6 stone age, heh.
Faisal
+1  A: 

Using lxml.etree and XPath:

files = []
for root, dirnames, filenames in os.walk('/var/packs'):
    for filename in filenames:
        if filename != 'info.xml':
            continue
        tree = lxml.etree.parse(os.path.join(root, filename))
        files.extend(tree.getroot().xpath('//File[Type[text()="config"]]/Path/text()'))

If lxml is not available, you can alternatively use the etree API in the standard library:

files = []
for root, dirnames, filenames in os.walk('/var/packs'):
    for filename in filenames:
        if filename != 'info.xml':
            continue
        tree = xml.etree.ElementTree.parse(os.path.join(root, filename))
        for file_node in tree.findall('File'):
            type_node = file_node.find('Type')
            if type_node is not None and type_node.text == 'config':
                path_node = file_node.find('Path')
                if path_node is not None:
                    files.append(path_node.text)
lunaryorn