views:

122

answers:

1

Hi, i got a weird problem regarding egrep and pipe

I tried to filter a stream containing some lines who start with a topic name, such as "TICK:this is a tick message\n"

When I try to use egrep to filter it : ./stream_generator | egrep 'TICK' | ./topic_processor It seems that the topic_processor never receives any messages

However, when i use the following python script: ./stream_generator | python filter.py --topics TICK | ./topic_processor everything looks to be fine.

I guess there need to be a 'flush' mechanism for egrep as well, is this correct?

Can anyone here give me a clue? Thanks a million

import sys
from optparse import OptionParser

if __name__ == '__main__':

    parser = OptionParser()

    parser.add_option("-m", "--topics",
                  action="store", type="string", dest="topics")

    (opts, args) = parser.parse_args()

    topics = opts.topics.split(':')

    while True:
        s = sys.stdin.readline()
        for each in topics:
            if s[0:4] == each:
                sys.stdout.write(s)
                sys.stdout.flush()
+2  A: 

Have you allowed the command ./stream_generator | egrep 'TICK' | ./topic_processor to run to completion? If the command has completed without producing output then the problem does not lie with buffering since, upon the termination of ./stream_generator, egrep will flush any of its buffers and in turn terminate.

Now, it is true that egrep will use heavy buffering when not outputting directly to a terminal (i.e. when outputting to a pipe or file), and it may appear for a while that egrep produces no output if not enough data has accumulated in egrep's buffer to warrant a flush. This behaviour can be changed in GNU egrep by using the --line-buffered option:

./stream_generator | egrep --line-buffered 'TICK' | ./topic_processor 

Cheers, V.

vladr