views:

247

answers:

3

How the Scala script that reads 5G log file from network drive should be modified in order to read last x lines (like 'tail' in Unix)?

::#!
@echo off
call scala %0 %*
goto :eof
::!#

import scala.io.Source
if (args.length > 0) {
for (line <-Source.fromFile(args(0)).getLines)
if(line.contains("percent")){
    print(line)
}
}
A: 

You'll obviously have to keep a buffer of x lines which you update on each iteration:

var buf: List[String] = Nil

for (line <- ...) {
  buf = (buf ::: List(line)) match {
    case x :: xs if (xs.length == n) => xs 
  }
}
oxbow_lakes
Isn't this a horrendously ugly way to do the same thing that Daniel did with a mutable queue? (It's inefficient too, since buf has to be copied in linear time on each iteration of the loop.)
Ken Bloom
I've streamlined the match (I put all the cases in for clarity). I think you need to read up on scala's immutable data structures as no copying is going on using `List`. I think it's quite *elegant* personally!
oxbow_lakes
Oxbow, buf _is_ being copied in linear time. The arg to the left of `:::` is always copied. It is `::` which doesn't copy anything.
Daniel
@Daniel - I don't see that it is at all because I'm only `:::`-ing to a 1-element list! A `ListBuffer` is created and *one element* is copied into it. However the original list is then prepended via `ListBuffer.prependToTail` - this doesn't copy as `ListBuffer.toList` is a constant time operation
oxbow_lakes
Sorry - I meant `ListBuffer.prependToList` of course!
oxbow_lakes
No. See http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/List.scala#L528. The entire contents of "those" (which was the item on the left, "buf") are copied, and then the ListBuffer is prepended to "this" (the item on the right, List(line)).If you're going to do this with an immutable data structure, at least use an immutable.Queue which uses two lists internally to get ammortized constant time enqueues and dequeues.
Ken Bloom
kbloom - apologies, you are correct - I'm being stupid
oxbow_lakes
However, the answer is still correct and probably not entirely deserving of downvotes
oxbow_lakes
+1  A: 

I'm using a mutable queue in this one:

::#!@echo off
call scala %0 %*
goto :eof
::!#
import scala.io.Source

val lastN = 5 // I guess you'll be getting them from args, but...
val queue = new scala.collection.mutable.Queue[String]

if (args.length > 0) {
  Source.fromFile(args(0)).getLines foreach { line =>
    queue.enqueue(line)
    if (queue.size > lastN) queue.dequeue
  }
  for (line <- queue)
    if (line.contains("percent")){
      print(line)
    }
}

If using an immutable queue, I'd use a reduceLeft, but I see no point in using an immutable queue for this.

Daniel
Reading the entire 5G file over the network and dumping out all but the last 5 lines doesn't seem ideal.
pumpkin
It isn't. You'd have to read blocks of bytes from the end until you got the required number of lines to do it right, using a couple of stacks for the lines -- one for the current block, one for overall. That requires going through Java I/O library, which is so awful I won't touch without being paid for it. :-)
Daniel
Also it is not possible to read from a file backwards unless you know the encoding is fixed-byte-length
oxbow_lakes
pumpkin's approach, from below, is the only sane way to do this, assuming that the network file system indeed allows seeking. As Daniel indicates, this will need calling into the java library. java.io.RandomAccessFile has a readLine() method that will let you read Strings directly from the file. Just back up generously, read lines into a buffer or queue until you hit the end, then take the last 5 lines you read.
Carl Smotricz
@oxbow_lakes: very good point!
Daniel
+2  A: 

If reading the file is expensive, as I expect it is over the network, I'd seek to the end of the file and read progressively larger chunks (more domain knowledge of the log file format might give you a better strategy here) from the end until you find the number of lines you're looking for.

pumpkin