ansaurus

Question

How do I programmatically check if a url needs unescaping in python?

Answer 1

A:

url.find('%') > -1

or wrap urllib.unquote in a try..except clause.

intuited 2010-10-24 04:14:21

The lack of a % does not actually cause unquote() to raise an exception.

kindall 2010-10-24 04:15:20

I think `'%' in url` would be slightly more Pythonic.

MatrixFrog 2010-10-24 04:37:55

@MatrixFrog: Yes, it would. Good point.

intuited 2010-10-24 19:34:03

Answer 2

+1 A:

You could do something like this. Although I don't have a url which causes an exception. So this is just hypothesis at this point. See if this approach works.

from urllib import unquote

#get url from your parse tree.
url_unq = unquote(url or '')
if not url_unq:
    url_unq = url

See if this works? It would be great if you could give an actual example of the URL which causes exception. What Exception? Could you post the StackTrace?

Worst-case you could always use a try-except around that block & go about your business.

MovieYoda 2010-10-24 04:30:33

Answer 3

A:

unquote doesn't throw exceptions because of URLs that don't need escaping. You haven't shown us the exception, but I'll guess that the problem is that old isn't a string, it's probably None, because you have an <a> tag with no href attribute.

Check the value of old before you try to use it.

Ned Batchelder 2010-10-24 11:59:19

This was it. I took a second look at the stack trace, and it was referencing a 'None' object. I made the changes to the xpath query as noted in the above comments, working great now. Thanks

Stev0 2010-10-24 20:09:56

ansaurus

tags:

views:

answers:

How do I programmatically check if a url needs unescaping in python?

related questions