views:

3239

answers:

4

This isn't working. Can this be done in find? Or do I need to xargs?

find -name 'file_*' -follow -type f -exec zcat {} \| agrep -dEOE 'grep' \;
+4  A: 
find . -name "file_*" -follow -type f -print0 | xargs -0 zcat | agrep -dEOE 'grep'
Paul Tomblin
Hoping to avoid -print and xargs for efficiency reasons. Maybe that's really my problem: find cannot handle piped commands through -exec
someguy
This doesn't work with files with spaces in their names; to fix, replace -print with -print0 and add the -0 option to xargs
Adam Rosenfield
@someguy - Wha? Avoiding xargs for efficiency reasons? Calling one instance of zcat, and passing it a list of multiple files, is *far* more efficient than exec-ing a new instance of it for each found file.
Sherm Pendley
@Adam - I've made your suggested change. 99% of the time when I'm doing finds, it's in my source code directories, and none of the files there have spaces so I don't bother with print0. Now my documents directory, on the other hand, I remember the print0.
Paul Tomblin
+1  A: 

the solution is easy: execute via sh

`... -exec sh -c "zcat {} | agrep -dEOE 'grep' " \;`
flolo
+7  A: 

The job of interpreting the pipe symbol as an instruction run multiple processes and pipe the output of one process into the input of another process is the responsibility of the shell (/bin/sh or equivalent).

In your example you can either choose to use your top level shell to perform the piping like so:

find -name 'file_*' -follow -type f -exec zcat {} \; | agrep -dEOE 'grep'

In terms of efficiency this results costs one invocation of find, numerous invocations of zcat, and one invocation of agrep.

This would result in only a single agrep process being spawned which would process all the output produced by numerous invocations of zcat.

If you for some reason would like to invoke agrep multiple times, you can do:

find . -name 'file_*' -follow -type f \
    -printf "zcat %p | agrep -dEOE 'grep'\n" | sh

This constructs a list of commands using pipes to execute, then sends these to a new shell to actually be executed. (Omitting the final "| sh" is a nice way to debug or perform dry runs of command lines like this.)

In terms of efficiency this results costs one invocation of find, one invocation of sh, numerous invocations of zcat and numerous invocations of agrep.

The most efficient solution in terms of number of command invocations is the suggestion from Paul Tomblin:

find . -name "file_*" -follow -type f -print0 | xargs -0 zcat | agrep -dEOE 'grep'

... which costs one invocation of find, one invocation of xargs, a few invocations of zcat and one invocation of agrep.

Rolf W. Rasmussen
This doesn't work with files with spaces in their names; to fix, replace -print with -print0 and add the -0 option to xargs
Adam Rosenfield
Another advantage of xargs would be, that you can speed it with modern multi core cpu even more up, by using the -P switch (-P 0).
flolo
Yes, the -P swich is indeed a nice way to speed up execution in general. Unfortunately, you run the risk of the output of parallel zcat processes being piped into agrep interleaved, which would affect the result. This effect can be demonstrated using: echo -e "1\n2" | xargs -P 0 -n 1 yes | uniq
Rolf W. Rasmussen
@Adam, I've made your suggested change.
Paul Tomblin
A: 

You can also pipe to a while loop that can do multiple actions on the file which find locates. So here is one for looking in jar archives for a given java class file in folder with a large distro of many jar files

find /usr/lib/eclipse/plugins -type f -name \*.jar | while read jar; do echo $jar; jar tf $jar | fgrep IObservableList ; done

the key point being that the while loop contains multiple commands referencing the passed in file name separated by semicolon and these commands can include pipes. So in that case I echo the name of the matching file then list what is in the archive filtering for a given class name. The output looks like:

/usr/lib/eclipse/plugins/org.eclipse.core.contenttype.source_3.4.1.R35x_v20090826-0451.jar /usr/lib/eclipse/plugins/org.eclipse.core.databinding.observable_1.2.0.M20090902-0800.jar org/eclipse/core/databinding/observable/list/IObservableList.class /usr/lib/eclipse/plugins/org.eclipse.search.source_3.5.1.r351_v20090708-0800.jar /usr/lib/eclipse/plugins/org.eclipse.jdt.apt.core.source_3.3.202.R35x_v20091130-2300.jar /usr/lib/eclipse/plugins/org.eclipse.cvs.source_1.0.400.v201002111343.jar /usr/lib/eclipse/plugins/org.eclipse.help.appserver_3.1.400.v20090429_1800.jar

in my bash shell (xubuntu10.04/xfce) it really does make the matched classname bold as the fgrep highlights the matched string; this makes it really easy to scan down the list of hundreds of jar files that were searched and easily see the matches in those files.

if you are on windows you can do the same thing to search for a class name in a butch of jar files with

for /R %j in (*.jar) do @echo %j & @jar tf %j | findstr IObservableList

note that in that on windows the command separator is '&' not ';' and that the '@' suppresses the echo of the command run to give a tidy output just like the linux find output above; although findstr is not make the matched string bold so you have to look a bit closer at the output to see the matched class name. It turns out that the windows 'for' command knows quite a few tricks such as looping through text files...

enjoy

simbo1905