tags:

views:

2147

answers:

5

On the UNIX bash shell (specifically Mac OS X Leopard) what would be the simplest way to copy every file having a specific extension from a folder hierarchy (including subdirectories) to the same destination folder (without subfolders)?

Obviously there is the problem of having duplicates in the source hierarchy. I wouldn't mind if they are overwritten.

Example: I need to copy every .txt file in the following hierarchy

/foo/a.txt
/foo/x.jpg
/foo/bar/a.txt
/foo/bar/c.jpg
/foo/bar/b.txt

To a folder named 'dest' and get:

/dest/a.txt
/dest/b.txt
+9  A: 

In bash:

find /foo -iname '*.txt' -exec cp \{\} /dest/ \;

find will find all the files under the path /foo matching the wildcard *.txt, case insensitively (That's what -iname means). For each file, find will execute cp {} /dest/, with the found file in place of {}.

Magnus Hoff
-exec cp -t dest/ {} + will be faster, because it only has to run cp once, with multiple arguments. -t is short for --target-directory. -l may be useful here, to make hardlinks. instead. And maybe -u, to end up with the newest version for each filename, instead of the first one find comes to.
Peter Cordes
+6  A: 

The only problem with Magnus' solution is that it forks off a new "cp" process for every file, which is not terribly efficient especially if there is a large number of files.

On Linux (or other systems with GNU coreutils) you can do:

find . -name "*.xml" -print0 | xargs -0 echo cp -t a

(The -0 allows it to work when your filenames have weird characters -- like spaces -- in them.)

Unfortunately I think Macs come with BSD-style tools. Anyone know a "standard" equivalent to the "-t" switch?

Stephen Darlington
+1  A: 

As far as the man page for cp on a FreeBSD box goes, there's no need for a -t switch. cp will assume the last argument on the command line to be the target directory if more than two names are passed.

agnul
the point of -t is that it lets you put the target as one of the first args. xargs doesn't make it easy to put args in the middle.
Peter Cordes
+1  A: 

If you really want to run just one command, why not cons one up and run it? Like so:

$ find /foo  -name '*.txt' | xargs echo | sed -e 's/^/cp /' -e 's|$| /dest|' | bash -sx

But that won't matter too much performance-wise unless you do this a lot or have a ton of files. Be careful of name collusions, however. I noticed in testing that GNU cp at least warns of collisions:

cp: will not overwrite just-created `/dest/tubguide.tex' with `./texmf/tex/plain/tugboat/tubguide.tex'

I think the cleanest is:

$ find /foo  -name '*.txt' | xargs -i cp {} /dest

Less syntax to remember than the -exec option.

Jon Ericson
+1  A: 

The answers above don't allow for name collisions as the asker didn't mind files being over-written.

I do mind files being over-written so came up with a different approach. Replacing each / in the path with - keep the hierarchy in the names, and puts all the files in one flat folder.

We use find to get the list of all files, then awk to create a mv command with the original filename and the modified filename then pass those to bash to be executed.

find ./from -type f | awk '{ str=$0; sub(/\.\//, "", str); gsub(/\//, "-", str); print "mv " $0 " ./to/" str }' | bash

where ./from and ./to are directories to mv from and to.

Rob Styles