urlparse

Parse custom URIs with urlparse (Python)

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http. If I do not adjust urlparse's uses_* lists I get this: >>> urlparse.urlparse("qqqq://base/id#hint") ('qqqq', '', '//base/id#hint', '', '', '') >>> url...

Python urlparse, correct or incorrect?

Python's urlparse function parses an url into six components (scheme, netloc, path and others stuff) Now I've found that parsing "example.com/path/file.ext" return no netloc but a path "example.com/path/file.ext". Should't it be netloc = "example.com" and path = "/path/file.ext"? Do we really need a "://" to determine wether or not a ...

parsing an url for crawler

hello, i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this ../tets/index.html if it is /test/index.html we can add with base url http://www.example.com/test/index.html what can i do for this kind of urls. ...

Python - Combining a url with urlunparse

Hi I'm new to Python so forgive me if this seems a little obvious but I can't see that it's been asked before. I'm writing something to 'clean' a URL. In this case all I'm trying to do is return a faked scheme as urlopen won't work without one. However, if I test this with 'www.python.org' It'll return http:///www.python.org. Does anyon...