tags:

views:

671

answers:

8

In bash, is there a way to chain multiple commands, all taking the same input from stdin? That is, one command reads stdin, does some processing, writes the output to a file. The next command in the chain gets the same input as what the first command got. And so on.

For example, consider a large text file to be split into multiple files by filtering the content. Something like this:

cat food_expenses.txt | grep "coffee" > coffee.txt | grep "tea" > tea.txt | grep "honey cake" > cake.txt

This obviously does not work, because the second grep gets the first grep's output, not the original text file. I tried inserting tee's but that does not help. Is there some bash magic that can cause the first grep to send its input to the pipe, not the output?

And by the way, splitting a file was a simple example. Consider splitting (filering by pattern search) a continuous live text stream coming over a network and writing the output to different named pipes or sockets. I would like to know if there is an easy way to do it using a shell script.

(This question is a cleaned up version of my earlier one , based on responses that pointed out the unclearness)

A: 

You can probably write a simple AWK script to do this in one shot. Can you describe the format of your file a little more?

  • Is it space/comma separated?
  • do you have the item descriptions on a specific 'column' where columns are defined by some separator like space, comma or something else?

If you can afford multiple grep runs this will work,

grep coffee food_expanses.txt> coffee.txt
grep tea food_expanses.txt> tea.txt

and, so on.

nik
Well, the expense sheet thing was just a quick example I could make up. What I really want to know is in shells like bash, is there a way to chain multiple commands, all taking the same input from stdin. That is, one command reads stdin, does some processing, writes the output to a file. The next command in the chain gets the same input as what the first command got. And so on.
soorajmr
Hmmm, it would be useful adding that point to the question above.
nik
Infact, your question subject does not suggest this detail at all.
nik
sorry for that. i hope it is better now.
soorajmr
+2  A: 

The obvious question is why do you want to do this within one command ?

If you don't want to write a script, and you want to run stuff in parallel, bash supports the concepts of subshells, and these can run in parallel. By putting your command in brackets, you can run your greps (or whatever) concurrently e.g.

$ (grep coffee food_expenses.txt > coffee.txt) && (grep tea food_expenses.txt > tea.txt)

Note that in the above your cat may be redundant since grep takes an input file argument.

You can (instead) play around with redirecting output through different streams. You're not limited to stdout/stderr but can assign new streams as required. I can't advise more on this other than direct you to examples here

Brian Agnew
Splitting a file was a simple example. Consider splitting (filering by pattern search) a continuous live text stream coming over a network and writing the output to different named pipes or sockets. This can of course be done in languages like C. I would like to know if there is an easy way to do it using a shell script.
soorajmr
I think in that case you should either script it (I think it may be trivial in perl) or have a look at the file descriptor redirections in my link above
Brian Agnew
+1  A: 

You could use awk to split into up to two files:

awk '/Coffee/ { print "Coffee" } /Tea/ { print "Tea" > "/dev/stderr" }' inputfile > coffee.file.txt 2> tea.file.txt
Stephen Darlington
+1  A: 

I like Stephen's idea of using awk instead of grep.

It ain't pretty, but here's a command that uses output redirection to keep all data flowing through stdout:

cat food.txt | 
awk '/coffee/ {print $0 > "/dev/stderr"} {print $0}' 
    2> coffee.txt | 
awk '/tea/ {print $0 > "/dev/stderr"} {print $0}' 
    2> tea.txt

As you can see, it uses awk to send all lines matching 'coffee' to stderr, and all lines regardless of content to stdout. Then stderr is fed to a file, and the process repeats with 'tea'.

If you wanted to filter out content at each step, you might use this:

cat food.txt | 
awk '/coffee/ {print $0 > "/dev/stderr"} $0 !~ /coffee/ {print $0}' 
    2> coffee.txt | 
awk '/tea/ {print $0 > "/dev/stderr"} $0 !~ /tea/ {print $0}' 
    2> tea.txt
Nate Kohl
This does what I wanted to do. Thank you!So, the basic difference is that grep can output to only one file while awk can output to mutiple files. awk here acts like a "tee", splitting the input stream. I'm not sure about the efficiency though, if the piped commands form a long chain and if the input is large.I was expecting that shell would have a generic way of doing such a thing, even if the command by itself cannot do the splitting. There doesn't seem to be such an option.
soorajmr
A: 

Assuming that your input is not infinite (as in the case of a network stream that you never plan on closing) I might consider using a subshell to put the data into a temp file, and then a series of other subshells to read it. I haven't tested this, but maybe it would look something like this { cat inputstream > tempfile }; { grep tea tempfile > tea.txt }; { grep coffee tempfile > coffee.txt};

I'm not certain of an elegant solution to the file getting too large if your input stream is not bounded in size however.

Aftermathew
A: 

Here are two bash scripts without awk. The second one doesn't even use grep!

With grep:

#!/bin/bash
tail -F food_expenses.txt | \
while read line
do
    for word in "coffee" "tea" "honey cake"
    do
        if [[ $line != ${line#*$word*} ]]
        then
            echo "$line"|grep "$word" >> ${word#* }.txt # use the last word in $word for the filename (i.e. cake.txt for "honey cake")
        fi
    done
done

Without grep:

#!/bin/bash
tail -F food_expenses.txt | \
while read line
do
    for word in "coffee" "tea" "honey cake"
    do
        if [[ $line != ${line#*$word*} ]] # does the line contain the word?
        then
            echo "$line" >> ${word#* }.txt # use the last word in $word for the filename (i.e. cake.txt for "honey cake")
        fi
    done
done;
Dennis Williamson
I used "tail -F" but "cat" would work, too.
Dennis Williamson
A: 

I am unclear why the filtering needs to be done in different steps. A single awk program can scan all the incoming lines, and dispatch the appropriate lines to individual files. This is a very simple dispatch that can feed multiple secondary commands (i.e. persistent processes that monitor the output files for new input, or the files could be sockets that are setup ahead of time and written to by the awk process.).

If there is a reason to have every filter see every line, then just remove the "next;" statements, and every filter will see every line.

$ cat split.awk
BEGIN{}
/^coffee/ {
    print $0 >> "/tmp/coffee.txt" ;
    next;
}
/^tea/ {
    print $0 >> "/tmp/tea.txt" ;
    next;
}
{ # default
    print $0 >> "/tmp/other.txt" ;
}
END {}
$
semiuseless
A: 

For this example, you should use awk as semiuseless suggests.

But in general to have N arbitrary programs read a copy of a single input stream, you can use tee and bash's process output substitution operator:

tee <food_expenses.txt \
  >(grep "coffee" >coffee.txt) \
  >(grep "tea" >tea.txt) \
  >(grep "honey cake" >cake.txt)

Note that >(command) is a bash extension.

Mark Edgar