tags:

views:

476

answers:

4
>>> os.path.basename('http://example.com/file.txt')
'file.txt'

.. and I thought os.path.* work only on local paths and not URLs? Note that the above example was run on Windows too .. with similar result.

+3  A: 

On windows, look at the source code: C:\Python25\Lib\ntpath.py

def basename(p):
    """Returns the final component of a pathname"""
    return split(p)[1]

os.path.split (in the same file) just split "\" (and sth. else)

sunqiang
+1  A: 

Use the source Luke:


def basename(p):
    """Returns the final component of a pathname"""
    i = p.rfind('/') + 1
    return p[i:]

Edit (response to clarification):

It works for URLs by accident, that's it. Because of that, exploiting its behaviour could be considered code smell by some.

Trying to "fix" it (check if passed path is not url) is also surprisingly difficult

www.google.com/test.php
[email protected]/12
./src/bin/doc/goto.c

are at the same time correct pathnames and URLs (relative), so is the http:/hello.txt (one /, and only on linux, and it's kinda stupid :)). You could "fix" it for absolute urls but relative ones will still work. Handling one special case in differently is a big no no in the python world.

To sum it up: import this

wuub
A: 

Why? Because it's useful for parsing URLs as well as local file paths. Why not?

Ted Percival
Because it's in the os.path module, and that isn't a path as understood by the OS if you're running windows; the path seperator is different. I believe that / is a valid filename character in Windows, which would make a URL a valid falename, and this behaviour incorrect. I'm not a Windows user, so some, or all of, this comment might be gibberish.
SpoonMeiser
@SpoonMeiser, Microsoft's implementation of the C library actually lets you use / as a valid alternative to \ (the OS itself, at syscall/Win32API levels, did up to a point, but I think it doesn't since a few years ago;-).
Alex Martelli
@Alex Martelli, If that's still true, then that would make sense.
SpoonMeiser
+3  A: 

In practice many functions of os.path are just string manipulation functions (which just happen to be especially handy for path manipulation) -- and since that's innocuous and occasionally handy, while formally speaking "incorrect", I doubt this will change anytime soon -- for more details, use the following simple one-liner at a shell/command prompt:

$ python -c"import sys; import StringIO; x=StringIO.StringIO(); sys.stdout=x; import this; sys.stdout = sys.__stdout__; print x.getvalue().splitlines()[10][9:]"
Alex Martelli
I have to say the one-liner is very impressive.
sunqiang
on windows: s/'/"/g
ars
@ars, tx, " is indeed better AND cross-platform so I edited. @sunqiang, glad you liked it!-)
Alex Martelli
Heh. Anyways, it seems to me that using `os.path.basename` (or split, or whatnot) in this maner (by passing URL) is evil .. as this is not a documented behavior (and might change in future).
Sridhar Ratnakumar
@srid, yep -- theurl.rsplit('/',1)[1] is definitely a better, safer approach.
Alex Martelli
Not to forget the query and anchor part in the URL.
Sridhar Ratnakumar