Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} +
or
find . | xargs cmd
(Assume that there are no funny characters in the filenames)
Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} +
or
find . | xargs cmd
(Assume that there are no funny characters in the filenames)
find . | xargs cmd
is more efficient (it runs cmd as few times as possible, unlike exec, which runs cmd once for each match). However, you will run into trouble if filenames contain spaces or funky characters.
The following is suggested to be used:
find . -print0 | xargs -0 cmd
this will work even if filenames contain funky characters (-print0 makes find print NULL-terminated matches, -0 makes xargs expect this format.)
Speed difference will be insignificant.
But you have to make sure that:
Your script will not assume that no file will have no space, tab, etc in file name; the first version is safe, the second is not.
Your script will not treat a file starting with "-" as an option.
So your code should look like this:
find . -exec cmd -option1 -option2 -- {} +
or
find . -print0 | xargs -0 cmd -option1 -option2 --
The first version is shorter and easier to write as you can ignore 1, but
the second version is more portable and safe, as "-exec cmd {} +" is a relatively new option in GNU findutils (since 2005, lots if running systems will not have it yet) and it was buggy recently. Also lots of people do not know this "-exec cmd {} +", as you can see from other answers.