views:

282

answers:

2

There are a number of questions about how to parse a URL in Python, this question is about the best or most Pythonic way to do it.

In my parsing I need 4 parts: the network location, the first part of the URL, the path and the filename and querystring parts.

http://www.somesite.com/base/first/second/third/fourth/foo.html?abc=123

should parse into:

netloc = 'www.somesite.com'
baseURL = 'base'
path = '/first/second/third/fourth/'
file = 'foo.html?abc=123'

The code below produces the correct result, but is there are better way to do this in Python?

url = "http://www.somesite.com/base/first/second/third/fourth/foo.html?abc=123"

file=  url.rpartition('/')[2]
netloc = urlparse(url)[1]
pathParts = path.split('/')
baseURL = pathParts[1]

partCount = len(pathParts) - 1

path = "/"
for i in range(2, partCount):
    path += pathParts[i] + "/"


print 'baseURL= ' + baseURL
print 'path= ' + path
print 'file= ' + file
print 'netloc= ' + netloc
+6  A: 

Since your requirements on what parts you want are different from what urlparse gives you, that's as good as it's going to get. You could, however, replace this:

partCount = len(pathParts) - 1

path = "/"
for i in range(2, partCount):
    path += pathParts[i] + "/"

With this:

path = '/'.join(pathParts[2:-1])
Paolo Bergantino
+2  A: 

I'd be inclined to start out with urlparse. Also, you can use rsplit, and the maxsplit parameter of split and rsplit to simplify things a bit:

_, netloc, path, _, q, _ = urlparse(url)
_, base, path = path.split('/', 2) # 1st component will always be empty
path, file = path.rsplit('/', 1)
if q: file += '?' + q
Laurence Gonsalves