tags:

views:

113

answers:

2

In my web application I render pages using PHP script, and then generate static HTML files from them. The static HTML are served to the users to speed up performance. The HTML files become stale eventually, and need to be deleted.

I am debating between two ways to write the eviction script.

The first is using a single find command, like

find /var/www/cache -type f -mmin +10 -exec rm \{} \;

The second form is by piping through xargs, something like

find /var/www/cache -type f -mmin +10 -print0 | xargs -0 rm

The first form invokes rm for each file it finds, while the second form just sends all the file names to a single rm (but the file list might be very long).

Which form would be faster?

In my case, the cache directory is shared between a few web servers, so this is all done over NFS, if that matters for this issue.

+3  A: 

I expect the xargs version to be slightly faster as you aren't spawning a process for each filename. But, I would be surprised if there was actually much difference in practice. If you're worried about the long list xargs sends to each invocation of rm, you can use -l with xargs to limit the number of tokens it will use. However, xargs knows the longest cmdline length and won't go beyond that.

kbyrd
I think that xargs will automatically (without -l) spawn several processes if the number of arguments is greater than the shell maximum.
MatthieuP
Thanks. I didn't know xargs can do that.
yhager
+2  A: 

The xargs version is dramatically faster with a lot of files than the -exec version as you posted it, this is because rm is executed once for each file you want to remove, while xargs will lump as many files as possible together into a single rm command.

With tens or hundreds of thousands of files, it can be the difference between a minute or less versus the better part of an hour.

You can get the same behavior with -exec by finishing the command with a "+" instead of "\;". This option is only available in newer versions of find.

The following two are roughly equivalent:

find . -print0 | xargs -0 rm
find . -exec rm \{} +

Note that the xargs version will still run slightly faster (by a few percent) on a multi-processor system, because some of the work can be parallelized. This is particularly true if a lot of computation is involved.

tylerl