views:

57

answers:

2

I'd like to build a graph showing which tags are used as children of which other tags in a given XML document.

I've written this function to get the unique set of child tags for a given tag in an lxml.etree tree:

def iter_unique_child_tags(root, tag):
    """Iterates through unique child tags for all instances of tag.

    Iteration starts at `root`.
    """
    found_child_tags = set()
    instances = root.iterdescendants(tag)
    from itertools import chain
    child_nodes = chain.from_iterable(i.getchildren() for i in instances)
    child_tags = (n.tag for n in child_nodes)
    for t in child_tags:
        if t not in found_child_tags:
            found_child_tags.add(t)
            yield t

Is there a general-purpose graph builder that I could use with this function to build a dotfile or a graph in some other format?

I'm also getting the sneaking suspicion that there is a tool somewhere explicitly designed for this purpose; what might that be?

+1  A: 

PyGraphViz will do this for you. NetworkX is another option for fancier stuff.

ars
I was checking out python-graph ("http://pypi.python.org/pypi/python-graph/1.7.0") a while ago; does pygraphviz basically do the same thing as python-graph-dot? I think I'm looking for an example; there are a lot of possibilities to sift through to figure out how to do what I want.
intuited
Sorry, never used python-graph, but it seems closer to networkx i.e. graph algorithms. PyGraphViz is largely an interface to GraphViz/dot itself. I guess either can output dot files and so do what you want. I would just start by trying out an example: http://networkx.lanl.gov/pygraphviz/tutorial.html (or the one on python-graph's wiki) and seeing how well you can adapt it for your requirements.
ars
A: 

I ended up using python-graph. I also ended up using argparse to build a command line interface that pulls some basic bits of info from XML documents and builds graph images in formats supported by pydot. It's called xmlearn and is sort of useful:

usage: xmlearn [-h] [-i INFILE] [-p PATH] {graph,dump,tags} ...

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        The XML file to learn about. Defaults to stdin.
  -p PATH, --path PATH  An XPath to be applied to various actions.
                        Defaults to the root node.

subcommands:
  {graph,dump,tags}
    dump                Dump xml data according to a set of rules.
    tags                Show information about tags.
    graph               Build a graph from the XML tags relationships.
intuited