views:

54

answers:

2

Hi.. I'm working on a crawler. Usually, when i type url1 in my browser, browser converts it to url2. How can i do this in Python?

url1: www.odevsitesi.com/ara.asp?kelime=doğanın dengesinin bozulması

url2: www.odevsitesi.com/ara.asp?kelime=do%F0an%FDn%20dengesinin%20bozulmas%FD

+5  A: 

urllib.quote

http://docs.python.org/library/urllib.html#urllib.quote

Example: quote('/~connolly/') yields '/%7econnolly/'.

FogleBird
thanks for answer.But this is not working for me.>>>urllib.quote("www.odevsitesi.com/ara.asp?kelime=doğanın dengesinin bozulması")>>>'www.odevsitesi.com/ara.asp%3Fkelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD'
Just make sure you aren't using unicode, which urllib doesn't support prior to Python 3
danben
@user260223: you only want to url-encode the query string, not the entire url
danben
yes i want fix whole url. Because when i try to urlopen(url1) it gives me http 400 error.
You're quoting too much -- see my answer for all precise details.
Alex Martelli
+5  A: 

You need to properly encode the URL (iso-8859-9 in your case), separate it into parts, urllib.quote the query part, and put it together again. I.e.:

>>> import urlparse
>>> import urllib
>>> x = u'http://www.odevsitesi.com/ara.asp?kelime=doğanın dengesinin bozulması' 
>>> y = x.encode('iso-8859-9')
>>> # just to show what the split of y looks like (we can also handle it as a tuple):
>>> urlparse.urlsplit(y)
SplitResult(scheme='http', netloc='www.odevsitesi.com', path='/ara.asp', query='kelime=do\xf0an\xfdn dengesinin bozulmas\xfd', fragment='')
>>> z = urlparse.urlsplit(y)
>>> quoted = z[:3] + (urllib.quote(z.query), z.fragment)
>>> # now just to show you what the 'quoted' tuple looks like:
>>> quoted
('http', 'www.odevsitesi.com', '/ara.asp', 'kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD', '')
>>> # and finally putting it back together:
>>> urlparse.urlunsplit(quoted)
'http://www.odevsitesi.com/ara.asp?kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD'
Alex Martelli