I am trying to store the links that I scrape from a site in a non binary tree. The links are laid out hierarchically (obviously). The question is how do I generate the tree ? I mean, how am I going to work my way through the pages provided by the link so that I know who is who's child.
For now I can get the first and the second level of links, but have no idea how to go from here besides that I have to recursively have to build it and have a way to stop when I get to a leaf (which I have).
What I was thinking was something like (code in Python):
def buildTree(root):
for node in root.children:
if <end condition here>:
continue
else:
nodes = getNodes(urllib2.urlopen(node.url).read())
node.addChildren(nodes)
buildTree(node)
where root and nodes are a user defined Node class