tags:

views:

59

answers:

4

I'm trying to get a list of the CSV files in a directory with python. This is really easy within unix:

ls -l *.csv

And, predictably, I get a list of the files that end with .csv in my directory. However, when I attempt the Python equivalent using the Subprocess module:

>>> import subprocess as sp
>>> sp.Popen(["ls", "-l", "*.csv"], stdout = sp.PIPE)
<subprocess.Popen object at 0xb780e90c>
>>> ls: cannot access *.csv: No such file or directory

Can somebody please explain what's going on?

Edit: Adding shell = True removes the error, but instead of getting a list of just CSV files, I get a list of all the files in the directory.

+1  A: 

When you enter ls -l *.csv at the shell, the shell itself expands *.csv into a list of all the filenames it matches. So the arguments to ls will actually be something more like ls -l spam.txt eggs.txt ham.py

The ls command doesn't understand wildcards itself. So when you pass the argument *.csv to it it tries to treat it as a filename, and there is no file with that name. As Nick says, you can use the shell=True parameter to have Python invoke a shell to run the subprocess, and the shell will expand the wildcards for you.

Weeble
+3  A: 

If you want it to behave as it does at the shell, you need to pass shell=True (your mileage may vary here, depending on your system and shell). In your case the problem is that when you do ls -l *.csv, the shell is evaluating what * means, not ls. (ls is merely formatting your results, but the shell has done the heavy lifting to determine what files match *.csv). Subprocess makes ls treat *.csv literally, and look for a file with that specific name, which of course there aren't any (since that's a pretty hard filename to create).

What you really should be doing is using os.listdir and filtering the names yourself.

Nick Bastin
I've followed yours and Weeble's suggestions, but now I get a list of all the files in the directory, not just the CSV files I want. Do you know what the problem is?
Tom
OK, I've figured out how to solve my problem using Python's `glob` module - much simpler overall. However, I still want to know what's going on.
Tom
Ah-hah! :-) I almost suggested using the `glob` module, but I didn't think it was quite what you wanted (but it addresses the same problem). In general you shouldn't trust `subprocess` to accurately replicate shell behaviour - it may on some systems, but not on others.
Nick Bastin
Thanks. I guess this goes as another example against using the shell and ignoring python's library. :D
Tom
@Nick, what does `/bin/sh -c ls -l *.csv` do for you when you type it at a shell prompt? for me (on Linux) it works just like `ls` (ignoring the rest of the args) -- and that `/bin/sh` command is what shell=True with a list (instead of a string) is specified as doing.
Alex Martelli
@Tom, if and when you want to use the shell (maybe not for `ls`, but many other tasks are harder to replicate in Python) of course you can: just use a string (and shell=True) in `Popen`, not a list (the list is fine when you **don't** need shell functionality and thus use `shell=False`, explicitly or by default).
Alex Martelli
@Alex: using sh gives me the results you get...using ksh gets me good results, probably because of how the argument parser works.
Nick Bastin
@Alex: actually I lie...I'd aliased ksh -c to quote the arguments, which is the only reason why that worked and /bin/sh didn't. I suppose the only way to get this to work with `shell=True` is to quote the command in its' entirety.
Nick Bastin
@Nick, yes, "pass a string" (instead of a list) when you're using shell=True, as I suggested.
Alex Martelli
+3  A: 

Why not use glob instead? It's going to be faster than "shelling out"!

import glob
glob.glob('*.csv')

This gives you just the names, not all the extra info ls -l supplies, though you can get extra info with os.stat calls on files of interest.

If you really must use ls -l, I think you want to pass it as a string for the shell to do the needed star-expansion:

proc = sp.Popen('ls -l *.csv', shell=True, stdout=sp.PIPE)
Alex Martelli
Yes, I discovered glob seconds before I saw your post. :P. However, I still want to figure out why Python can't produce the same output as the shell.
Tom
@Tom, sure it can -- "/bin/sh -c ls -l '*.csv'" (which as the docs say is the exact equivalent of shell=True with a list instead of a string) behaves just the same way, listing all file names -- try it! When you want the behavior you'd get by typing at the shell a plain string, you give `Popen` the same string (with shell=True), as I said.
Alex Martelli
+1  A: 
p=subprocess.Popen(["ls", "-l", "*.out"], stdout = subprocess.PIPE, shell=True)

causes

/bin/sh -c ls -l *.out

to be executed.

If you try this command in a directory, you'll see -- in typical mystifying-shell fashion -- all files are listed. And the -l flag is ignored as well. That's a clue.

You see, the -c flag is picking up only the ls. The rest of the arguments are being eaten up by /bin/sh, not by ls.

To get this command to work right at the terminal, you have to type

/bin/sh -c "ls -l *.out"

Now /bin/sh sees the full command "ls -l *.out" as the argument to the -c flag.

So to get this to work out right using subprocess.Popen, you are best off just passing the command as a single string

p=subprocess.Popen("ls -l *.out", stdout = subprocess.PIPE, shell=True)
output,error=p.communicate()
print(output)
unutbu