views:

588

answers:

1

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

If I do not adjust urlparse's uses_* lists I get this:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

+1  A: 

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

I would use the results of the first parse, then synthesize an http url and parse it again:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)
Ned Batchelder
I perfer my own workaround to this one; I would have to do this roundtrip all the time in my custom URL module.
kaizer.se
Fair enough: I didn't like relying on internals of the module, but I reasonable engineers can differ!
Ned Batchelder