tags:

views:

247

answers:

2

I have a simple Ruby script that looks like this

require 'csv'
while line = STDIN.gets
  array = CSV.parse_line(line)
  puts array[2]
end

But when I try using this script in a Unix pipeline like this, I get 10 lines of output, followed by an error:

ruby lib/myscript.rb < data.csv  | head

12080450
12080451
12080517
12081046
12081048
12081050
12081051
12081052
12081054
lib/myscript.rb:4:in `write': Broken pipe - <STDOUT> (Errno::EPIPE)

Is there a way to write the Ruby script in a way that prevents the broken pipe exception from being raised?

+3  A: 

The trick I use is to replace head with sed -n 1,10p.

This keeps the pipe open so ruby (or any other program that tests for broken pipes and complains) doesn't get the broken pipe and therefore doesn't complain. Choose the value you want for the number of lines.

Clearly, this is not attempting to modify your Ruby script. There almost certainly is a way to do it in the Ruby code. However, the 'sed instead of head' technique works even where you don't have the option of modifying the program that generates the message.

Jonathan Leffler
+4  A: 

head is closing the standard output stream after it has read all the data it needs. You should handle the exception and stop writing to standard output. The following code will abort the loop once standard output has been closed:

while line = STDIN.gets
  array = CSV.parse_line(line)
  begin
    puts array[2]
  rescue Errno::EPIPE
    break
  end
end
Phil Ross
If head closes the stream, then why does `$stdout.closed?` still return false and why does the error not happen immediately, but only after many lines have been written into the void? I think head actually keeps the stream open, but doesn't read from it anymore, which causes the buffer to be full at some point which causes the broken point.
sepp2k
@sepp2k - There are most likely a couple of things going on here: 1, the default buffering mode for stdout changes from line-oriented to block-oriented when using a pipeline, so you'll need to flush between each write. 2, `head` needs a chance to run in order to close the stream, but many more bytes of data may have been written before it gets the chance to run. I wrote a variant of the script with `$stdout.flush ; sleep 0.1` between each `write`, and in this case `$stdout.closed?` works.
Aidan Cully
@AidanCully: Ah, very interesting.
sepp2k