views:

40

answers:

2

Hi guys, I'm student here, new to python and programming in general.

I have a dictionary links which holds a tuple mapped to a number. How can I join the second url in the second tuple together with the urljoin() function? What I'm trying to do is get complete links so I can run a recursive function search() which takes a complete url as an arguement, finds all the links in each url and stores the number of links mapped to the links in a database.

So far, I have:

links {('href', 'http://reed.cs.depaul.edu/lperkovic/csc242/test2.html'): 1, ('href', 'test3.html'): 1}

I want http://reed.cs.depaul.edu/lperkovic/csc242/test3.html...

A: 

1) There is no concept of "first" or "second" when considering the keys in a python dictionary; the keys have no defined order.

2) It's very unclear what you're actually trying to do. You'll get better help if you work harder on describing the problem you're trying to solve. On the other hand, if this is a homework assignment, then you shouldn't be looking for this kind of help here. You should instead be asking your TA.

Jonathan Feinberg
I need the code for joining 'test3.html' with 'http://reed.cs.depaul.edu/lperkovic/csc242/'... is that better?
ptabatt
A: 

I think you should reconsider how you store the base URL and the URL fragments. Storing them in a dict like you're doing now makes things quite a lot harder than it has to be.

One suggestion would be to generate the full URLs before you store it in a dict, drop the 'href' part from the tuples (and the tuples), and simply use the URLs themselves as keys. Something like this:

from urlparse import urljoin
links = {}
urlbase = 'http://reed.cs.depaul.edu/lperkovic/csc242/test2.html'
links[urljoin(urlbase, 'test3.html')] = 1

This would produce a dict looking like this:

>>> links
{'http://reed.cs.depaul.edu/lperkovic/csc242/test3.html': 1}
Vegar