tags:

views:

79

answers:

2

Recently a peer and I discovered an interesting bug in GNU grep 2.5.1 in which standard input with lines greater than 200,000,000 characters causes grep to fail, even if the pattern is not in one of the long lines. If however grep reads the file with grep match file it works fine. It appears this bug is fixed in 2.5.3.

cat big_file | grep pattern # this dies with an exit code 0 after encountering a long line

grep pattern big_file # works fine!

Does anyone know why this happens? Is the line limitation the true cause?

+1  A: 

I was looking through the commits, but I couldn't find anything. You can have a go at it.

http://git.savannah.gnu.org/cgit/grep.git/log/?ofs=200

That links to the page with 2.5.1. Go up and back from there to try to find it.

Dave
I looked through the commits too ("use the source, Luke"!) and didn't find a thing either. This is unfortunate; if you can't trust your tools, what can you trust?
Vince
+2  A: 

There is or was a memory exhaustion problem I ran into when reading very long lines, but on most systems allocating 200MB is unlikely to fail.

http://savannah.gnu.org/bugs/?9886

I believe it uses memory-mapped files when reading directly, and obviously that's not an option when reading from a pipe, so perhaps that's the difference.

Also, how complex is your pattern? There's a known limitation of grep where the {n,m} option with large counts can cause huge amounts of memory to be allocated.

Tim Sylvester