views:

638

answers:

10

I often have a command that processes one file, and I want to run it on every file in a directory. Is there any built-in way to do this?

For example, say I have a program data which outputs an important number about a file:

./data foo
137
./data bar
42

I want to run it on every file in the directory in some manner like this:

map data `ls *`
ls * | map data

to yield output like this:

foo: 137
bar: 42
+2  A: 

If you just want to run a command on every file you can do this:

for i in *; do data "$i"; done

If you also wish to display the filename that it is currently working on then you could use this:

for i in *; do echo -n "$i: "; data "$i"; done
Mark Byers
With the caveat of quoting `$i` so that files with spaces in their names don't get treated as multiple arguments to whatever program is being called
Daniel DiPaolo
Daniel: Fixed, thanks.
Mark Byers
You can get away with a simple for loop in this case because the `ls` can be turned into glob expansion. If you actually want to use the output of a command, the for loop will split on all embedded whitespace, so you'll probably want to set `$IFS` to only newlines - see my answer if that's necessary.
Jefromi
+3  A: 

It looks like you want xargs:

find . --maxdepth 1 | xargs -d'\n' data

To print each command first, it gets a little more complex:

find . --maxdepth 1 | xargs -d'\n' -I {} bash -c "echo {}; data {}"
Stephen
ah nice, most concise one so far. is there nay easy way to also print the file its currently working on?
Claudiu
`data` is certainly welcome to print the current filename if it likes...
Jefromi
@Claudiu: edited to show the file.
Stephen
ls is not supposed to be used that way. Instead, ls is intended to present a listing to the user: it may replace unprintable characters, reformat the listing, etc.
Juliano
@Juliano, fair enough. switched to find.
Stephen
A: 

Try this:

for i in *; do echo ${i}: `data $i`; done
Juha Syrjälä
+1  A: 

The common methods are:

ls * | while read file; do data "$file"; done

for file in $(ls *); do data "$file"; done

The second can run into problems if you have whitespace in filenames; in that case you'd probably want to make sure it runs in a subshell, and set IFS:

( IFS=$'\n'; for file in $(ls *); do data "$file"; done )

You can easily wrap the first one up in a script:

#!/bin/bash
# map.bash

while read file; do
    "$1" "$file"
done

which can be executed as you requested - just be careful never to accidentally execute anything dumb with it. The benefit of using a looping construct is that you can easily place multiple commands inside it as part of a one-liner, unlike xargs where you'll have to place them in an executable script for it to run.

Of course, you can also just use the utility xargs:

ls * | xargs -n 1 data

Note that you should make sure indicators are turned off (ls --indicator-style=none) if you normally use them, or the @ appended to symlinks will turn them into nonexistent filenames.

Jefromi
use `for file in *` instead of `for file in $(ls *)`
glenn jackman
@glenn jackman: I realize that, and it was covered in another answer. I was attempting to provide the general answer here, because it's not always simple globbing that gets you your filename list. It can be `grep -l`, `find ...`, who knows.
Jefromi
A: 

You can create a shell script like so:

#!/bin/bash
cd /path/to/your/dir
for file in `dir -d *` ; do
  ./data "$file"
done

That loops through every file in /path/to/your/dir and runs your "data" script on it. Be sure to chmod the above script so that it is executable.

Banjer
A: 

You could also use PRLL.

raspi
+3  A: 

You should avoid parsing ls:

find . -maxdepth 1 | while read -r file; do do_something_with "$file"; done

or

while read -r file; do do_something_with "$file"; done < <(find . -maxdepth 1)

The latter doesn't create a subshell out of the while loop.

Dennis Williamson
+1 For not parsing ls.
Juliano
A: 

ls doesn't handle blanks, linefeeds and other funky stuff in filenames and should be avoided where possible.

find is only useful if you like to dive into subdirs, or if you want to make usage from the other options (mtime, size, you name it).

But many commands handle multiple files themself, so don't need a for-loop:

for d in * ; do du -s $d; done

but

du -s *
md5sum e* 
identify *jpg
grep bash ../*.sh
user unknown
+1  A: 

Since you specifically asked about this in terms of "map", I thought I'd share this function I have in my personal shell library:

# map_lines: evaluate a command for each line of input
map_lines()
{
        while read line ; do
                $1 $line
        done
}

I use this in the manner that you for a solution:

$ ls | map_lines ./data

I named it map_lines instead of map as I assumed some day I may implement a map_args where you would use it like this:

$ map_args ./data *

That function would look like this:

map_args()
{
    cmd="$1" ; shift
    for arg ; do
        $cmd "$arg"
    done
}
camh
+3  A: 

If you are just trying to execute your data program on a bunch of files, the easiest/least complicated way is to use -exec in find.

Say you wanted to execute data on all txt files in the current directory (and subdirectories). This is all you'd need:

find . -name "*.txt" -exec data {} \;

If you wanted to restrict it to the current directory, you could do this:

find . -maxdepth 1 -name "*.txt" -exec data {} \;

There are lots of options with find.

DevNull
oh yes, this is actually what i want! ty
Claudiu