views:

70

answers:

6

I'd like to run a program on a directory of files. I know how to do this with one file, using

cat myFile.xml | myProgram.py.

How can I run myProgram.py over a folder, say myFolder?

Thanks!

+3  A: 

Assuming your program can accept a filename as its first command line argument, one way is to use find to find all the files in the folder, and then use xargs to run your program for each of them:

find myFolder | xargs -n 1 myProgram.py

The -n 1 means "run the program once per file". If your program is happy to receive multiple filenames on its command line, you can omit the -n 1 and xargs will run your program fewer times with multiple files on its command line.

(find will do a recursive search, so you'll get all the files in and under myFolder. You can use find myFolder -maxdepth 1 to prevent that.)

(Thanks to @Personman for pointing out that this will run the program for the folder itself as well as the files. You can use find myFolder -type f to tell find to only return regular files.)

RichieHindle
Gotta be careful with find. First of all, you need the trailing slash in 'find myFolder/'. Second, be aware that find returns the folder itself, as well as all its contents, and unless you use -maxdepth, it does so recursively too. Find is pretty complicated; you should read its manpage first.
Personman
Seems like overkill based on the OP.
drewk
A: 

cat myFolder/* | myProgram.py

Paul R
does this run myProgram once for each file in myFolder?
Sanjay Manohar
@Sanjay: no, this will concatenate all the files into one stream which myProgram.py will see on stdin. If the OP wants myProgram.py to be invoked once for each file then he needs to state that in his question, and one of the find/xargs answers would then be more apporpriate in that case.
Paul R
+1  A: 

How about:

for x in myFolder/*
do
cat $x | myProgram.py
done
Beta
+2  A: 

I like

ls | xargs cat

for its functional language feel. YMMV.

Ukko
How the heck did I never hear about this before? +1
Beta
A: 

Or cat *.xml | myProgram.py that will produce the output of every .xml file to stdin then piped to your program. This combines all files into one stream.

myProgram.py *.xml will expand every filename as input to your program like this: myProgram.py file1.xml file2.xml file3.xml ... filen.xml Each file remains separate and the script can tell one from another.

Python / Perl / sh scripts, base case, usually handle that the same as myProgram.py file1.xml; myProgram.py file2.xml; myProgram.py filen.xml with the ; meaning new command.

Play with it and welcome to Unix!

drewk
A: 

Check out this answer...

DevNull