tags:

views:

347

answers:

7

I have a string variable which represents a dos path e.g:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this string into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I have tried using split() and replace() but they either only process the first backslash or they insert hex numbers into the string.

I need to convert this string variable into a raw string somehow so that I can parse it.

What's the best way to do this?

I should also add that the contents of var i.e. the path that I'm trying to parse, is actually the return value of a command line query. It's not path data that I generate myself. Its stored in a file, and the command line tool is not going to escape the backslashes.

+2  A: 

It works for me:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.

The r modifier marks the string literal as "raw"; notice how embedded backslashes are not doubled.

unwind
@unwind - the `r` in front of your string, what does that refer to?
BeeBand
r means raw string - it auto-escapes `\ ` characters. It's useful to use whenever you're doing paths.
Wayne Werner
@Wayne, if the string is passed in as a variable to a function how do I ensure that it is treated as a raw string?
BeeBand
@BeeBand: you don't need to care; the r"" is just something that matters during compilation/parsing of the code, it's not something that becomes a property of the string once parsed. It just means "here's a string literal, but don't interpret any backslashes as having any other meaning than being backslashes".
unwind
I think it might be helpful to mention you minus well do it more ambiguous using a.split(os.sep)instead of hard coding it?
Tim McJilton
I have to downvote you for missing a chance to explain `os.path.split` and `os.pathsep`, considering both of those are far more portable than what you have written. It might not matter to OP now, but it will when he's writing something that needs to move platforms.
Jed Smith
+1  A: 

use ntpath.split()

caspin
@Caspin, when i use os.path.split() I get, (`d:\\stuff`, `morestuff\x0curtherdown\thefile.mux`)
BeeBand
As BeeBand pointed out, os.path.split() really doesn't do the desired thing.
unwind
sorry I just realized os.path only works depending on your os. ntpath will parse dos paths.
caspin
@Caspin, even with ntpath I still get `d:\\stuff, morestuff\x0curtherdown\thefile.mux`
BeeBand
@BeeBand: your having issues with escaping your string. `'\x0c'` is the form feed character. The way to create the form feed character is '\f'. If you really want the literal string '\f' you have two options: `'\\f'` or `r'\f'`.
caspin
+8  A: 

The problem here starts with how you're creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T'. In your case,\f` is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you'll get the result you want.

Craig Trader
@W. Craig Trader - thanks, but this path is not one that I generate myself - it comes back to me from another program and I have to store this data in a variable. I am not sure how to convert data stored in a variable into "raw text".
BeeBand
There isn't such thing as a "raw text"... it's just how you represent it in the source. Either prepend r"" to the string, or pass it through .replace('\\', '/')
Marco Mariani
@BeeBand, how are you getting the data back from the other program? Are you reading it from a file, a pipe, a socket? If so, then you don't need to do anything fancy; the only reason for doubling backslashes or using raw strings is to place string constants into Python code. On the other hand, if the other program is generating doubled-backslashes, then you'd want to clean that up before splitting your path.
Craig Trader
@W. Craig Trader - i'm reading it from a file, that gets written by another program. I couldn't get `split()` or `replace()` to work for some reason - I kept getting hex values. You're right though, I think I was barking up the wrong tree with the raw string idea - I think I was just using `split()` incorrectly. Because I tried some of these solutions using `split()` and they work for me now.
BeeBand
Maybe the file is UTF-8/UTF-16 (unicode) encoded?
Craig Trader
+2  A: 

The stuff about about mypath.split("\") would be better expressed as mypath.split(os.pathsep). pathsep is the path separator for your particular platform (e.g., \ for windows, / for unix, etc), and the python build knows which one to use. If you use pathsep, then your code will be platform agnostic.

Chris
Or `os.path.split`. You want to be careful with `os.pathsep`, because it's `:` on my version of Python in OS X (and `os.path.split` properly handles `/`).
Jed Smith
+15  A: 

I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive,path_and_file=os.path.splitdrive(path)

Get the path and the file:

path,file=os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders=[]
while 1:
    path,folder=os.path.split(path)

    if folder!="":
        folders.append(folder)
    else:
        if path!="":
            folders.append(path)

        break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)

brone
@brone - I prefer to use this solution, than having to worry about escaping the backslash. thanks!
BeeBand
I'll echo your sentiment - os.path should be used any time you're not just writing a one-off.
Wayne Werner
@brone - I thought that if I selected your answer you would get the bounty?? Sorry, it looks like SO autoselected the answer for the bounty - the points were meant to go to you.
BeeBand
A: 

Just like others explained - your problem stemmed from using \, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) - there wouldn't have been problem splitting on '\\' or r'\'.

And just like others suggested, if you want to use \ in program literal, you have to either duplicate it \\ or the whole literal has to be prefixed by r, like so r'lite\ral' or r"lite\ral" to avoid the parser converting that \ and r to CR (carriage return) character.

There is one more way though - just don't use backslash \ pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /! Somehow not many people know that.. but it works:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

This by the way will make your code work on Unix, Windows and Mac... because all of them do use / as directory separator... even if you don't want to use the predefined constants of module os.

Nas Banov
@Nas - Unfortunately the data is being returned to me from another program that I run from my python script. I don't have any control over whether to use '\' or '/' - it is the third party program that determines this ( probably on a platform basis ).
BeeBand
@BeeBand: Ah, then you won't have the problem you experienced during testing, when you provided the string as literal in your program. Or you can do the following evil hack after receiving the path: `var = var.replace('\\','/')` - replace \ with / and proceed working with forward slashes only :)
Nas Banov
@Nas - that is indeed an evil hack :o)
BeeBand
@BeeBand: that's why i warned. When i say something is evil, i don't necessarily mean it should never be used - but one should *very much* be aware why they are using it and alert of unintended consequences. In this case, a very unlikely consequence is that if this is used on Unix file system with `\` use in file or directory name (it's really hard but possible) - this code will 'break'
Nas Banov
+2  A: 

Let assume you have have a file filedata.txt with content:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

You can read and split the file paths:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']
zoli2k
@zoli2k - this does indeed work, thanks! But I chose brone's solution because I prefer not to worry about escaping the backslash.
BeeBand