views:

117

answers:

1

I've been working on a log viewer for a Rails app and have found that I need to read around 200 lines of a log file from bottom to top instead of the default top to bottom.

Log files can get quite large, so I've already tried and ruled out the IO.readlines("log_file.log")[-200..-1] method.

Are there any other ways to go about reading a file backwards in Ruby without the need for a plugin or gem?

+4  A: 

The only correct way to do this that also works on enormous files is to read n bytes at a time from the end until you have the number of lines that you want. This is essentially how Unix tail works.

An example implementation of IO#tail(n), which returns the last n lines as an Array:

class IO
  TAIL_BUF_LENGTH = 1 << 16

  def tail(n)
    return [] if n < 1

    seek -TAIL_BUF_LENGTH, SEEK_END

    buf = ""
    while buf.count("\n") <= n
      buf = read(TAIL_BUF_LENGTH) + buf
      seek 2 * -TAIL_BUF_LENGTH, SEEK_CUR
    end

    buf.split("\n")[-n..-1]
  end
end

The implementation is a little naive, but a quick benchmark shows what a ridiculous difference this simple implementation can already make (tested with a ~25MB file generated with yes > yes.txt):

                            user     system      total        real
f.readlines[-200..-1]   7.150000   1.150000   8.300000 (  8.297671)
f.tail(200)             0.000000   0.000000   0.000000 (  0.000367)

The benchmark code:

require "benchmark"

FILE = "yes.txt"

Benchmark.bmbm do |b|
  b.report "f.readlines[-200..-1]" do
    File.open(FILE) do |f|
      f.readlines[-200..-1]
    end
  end

  b.report "f.tail(200)" do
    File.open(FILE) do |f|
      f.tail(200)
    end
  end
end

Of course, other implementations already exist. I haven't tried any, so I cannot tell you which is best.

molf
I think you mean `TAIL_BUF_LENGTH = 2**16` or `1 << 16`, both of which evaluate to `65536` (64Ki). `2^16` is binary exclusive-or and evaluates to `18`.
Jörg W Mittag
Works great! The benchmark difference is insane compared to readlines.Is it possible to also output the corresponding line number for each line in the resulting array? Thanks!
two2twelve
@two2twelve: No, it isn't. The *whole purpose* of this entire exercise is to read the file "from bottom to top". (Your words, not mine.) How would you know at which line (which is counted from the *top* of the file) you are, if you started at the *bottom*? Or did you mean to count from the bottom upwards? In that case, it's easy: the line at index `i` in the buffer is the `n-i` th line from the bottom.
Jörg W Mittag
@Jörg, good point :-)
molf