ansaurus

Question

[Python] Detect destination of shortened, or "tiny" url

Answer 1

+1 A:

Do a list of the most used URL-shorteners and expand it while you discover new ones, then check a link for one item of the list.
You do not know where the URL points to unless you follow it, so best way to do this should be to follow the shortened url and extract the http header of the response to see where it heads to.

I guess with 100 requests per second you could surely go into trouble (I guestt the worst that can happen is they blacklist your IP as a spammer).

Lex 2010-03-16 12:15:59

Do you know what python library and command I could use to most efficiently discover the destination URL? For exampleimport urllib2response = urllib2.urlopen("http://bit.ly/AoifeMcL_ID3")headers = response.headers.headersIn this case the headers contain the domain name of the destination URL, but I don't see the complete URL...where do I need to look in the response for the destination URL?

conradlee 2010-03-16 12:28:10

Answer 2

+2 A:

(AFAIK) Most url shorteners keep track of urls already shortened, so several requests to the same engine with the same URL will return the same short code.

As has been suggested, the best way to extract the real url is to read the headers from a response to a request for the shortened URL. However, some shortening services (eg bit.ly) provide an API method to return the long url

ZombieSheep 2010-03-16 12:19:17

Answer 3

+6 A:

The easiest way to get the destination of a shortened URL is with urllib. Given that the short URL is valid (response code 200), the URL be returned to you.

>>> import urllib
>>> resp = urllib.urlopen('http://bit.ly/bcFOko')
>>> resp.getcode()
200
>>> resp.url
'http://mrdoob.com/lab/javascript/harmony/'

And that's that!

jathanism 2010-03-16 12:37:17

ansaurus

tags:

views:

answers:

[Python] Detect destination of shortened, or "tiny" url

related questions