This is a follow-up to the question (Link)
What I intend on doing is using the XML to create a graph using NetworkX. Looking at the DOM structure below, all nodes within the same node should have an edge between them, and all nodes that have attended the same conference should have a node to that conference. To summarize, all authors that worked together on a paper should be connected to each other, and all authors who have attended a particular conference should be connected to that conference.
<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>
I've figured out how to connect authors to conferences, but I'm unsure about how to connect authors to each other. The thing that I'm having difficulty with is how to iterate over the authors that have worked on the same paper and connect them together.
dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
G.add_node(conf_name)
#The nodeValue is split in order to get the name of the author
#and to exclude the university they are part of
plist=node.getElementsByTagName('paper')
for p in plist:
author=str(p.childNodes[0].nodeValue)
author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>
alist=node.getElementsByTagName('author')
for a in alist:
authortext= str(a.childNodes[0].nodeValue).split("(")
if authortext[0] in dict:
edgeQuantity=dict[authortext[0]]
edgeQuantity+=1
dict[authortext[0]]=edgeQuantity
G.add_edge(authortext[0],conf_name)
#Otherwise, add it to the dictionary and create an edge to the conference.
else:
dict[authortext[0]]= 1
G.add_node(authortext[0])
G.add_edge(authortext[0],conf_name)
i+=1