Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} +
or
find . | xargs cmd
(Assume that there are no funny characters in the filenames)
Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} +
or
find . | xargs cmd
(Assume that there are no funny characters in the filenames)
find . | xargs cmd
is more efficient (it runs cmd
as few times as possible, unlike exec
, which runs cmd
once for each match). However, you will run into trouble if filenames contain spaces or funky characters.
The following is suggested to be used:
find . -print0 | xargs -0 cmd
this will work even if filenames contain funky characters (-print0
makes find
print NULL-terminated matches, -0
makes xargs
expect this format.)
Speed difference will be insignificant.
But you have to make sure that:
Your script will not assume that no file will have no space, tab, etc in file name; the first version is safe, the second is not.
Your script will not treat a file starting with "-
" as an option.
So your code should look like this:
find . -exec cmd -option1 -option2 -- {} +
or
find . -print0 | xargs -0 cmd -option1 -option2 --
The first version is shorter and easier to write as you can ignore 1, but
the second version is more portable and safe, as "-exec cmd {} +
" is a relatively new option in GNU findutils (since 2005, lots if running systems will not have it yet) and it was buggy recently. Also lots of people do not know this "-exec cmd {} +
", as you can see from other answers.