ansaurus

Question

How can I split a url string up into separate parts in Python?

Answer 1

+7 A:

I have no experience with python, but I found the urlparse module, which should do the job:

http://docs.python.org/library/urlparse.html

Sebastian Dietz 2009-01-16 07:49:55

Answer 2

+1 A:

In Python a lot of operations are done using lists. The urlparse module mentioned by Sebasian Dietz may well solve your specific problem, but if you're generally interested in Pythonic ways to find slashes in strings, for example, try something like this:

url = 'http://example.com/random/folder/path.html'
# Create a list of each bit between slashes
slashparts = url.split('/')
# Now join back the first three sections 'http:', '' and 'example.com'
basename = '/'.join(slashparts[:3]) + '/'
# All except the last one
dirname = '/'.join(slashparts[:-1]) + '/'
print 'slashparts = %s' % slashparts
print 'basename = %s' % basename
print 'dirname = %s' % dirname

The output of this program is this:

slashparts = ['http:', '', 'example.com', 'random', 'folder', 'path.html']
basename = http://example.com/
dirname = http://example.com/random/folder/

The interesting bits are split, join, the slice notation array[A:B] (including negatives for offsets-from-the-end) and, as a bonus, the % operator on strings to give printf-style formatting.

Paul Stephenson 2009-01-16 08:08:32

Answer 3

A:

If this is the extent of your URL parsing, Python's inbuilt rpartition will do the job:

>>> URL = "http://example.com/random/folder/path.html"
>>> Segments = URL.rpartition('/')
>>> Segments[0]
'http://example.com/random/folder'
>>> Segments[2]
'path.html'

From Pydoc, str.rpartition:

Splits the string at the last occurrence of sep, and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself

What this means is that rpartition does the searching for you, and splits the string at the last (right most) occurrence of the character you specify (in this case / ). It returns a tuple containing:

(everything to the left of char , the character itself , everything to the right of char)

Mike Hamer 2009-01-16 08:11:11

Answer 4

+8 A:

The urlparse module in python 2.x (or urllib.parse in python 3.x) would be the way to do it.

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

If you wanted to do more work on the path of the file under the url, you can use the posixpath module :

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

After that, you can use posixpath.join to glue the parts together.

EDIT: I totally forgot that windows users will choke on the path separator in os.path. I read the posixpath module docs, and it has a special reference to URL manipulation, so all's good.

sykora 2009-01-16 08:14:36

+1 on urlparse, but don't use os.path to manipulate the .path part. os.path's handling necessarily differs from OS to OS, whereas URIs always use '/' as the path part separator.

bobince 2009-01-16 11:30:22

yeah, remove the os.path part. Maybe use the posixpath module instead. Then you'll have my vote.

nosklo 2009-01-16 11:37:06

argh, missed that one completely. It's been ages since I used windows :|. Fixed.

sykora 2009-01-16 13:03:47

ansaurus

tags:

views:

answers:

How can I split a url string up into separate parts in Python?

related questions