tags:

views:

3193

answers:

3

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?

+6  A: 

Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.

Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.

So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().

Jon Skeet
UTF-8 doesn't matter - you need the last CR or LF character, which is a single byte in both ASCII and UTF-8.
MSalters
@MSalters: Good point. Will update...
Jon Skeet
A: 

Do you know what appears at the end, and is it constant?

Also C# but it looks like you should be able to set the stream's position:

From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file

using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
rball
In Java's FileInputStream (which FileReader is based on), you cannot set the position; you can only skip forward, which probably does not read the parts you skip, but is still a one-way operation and thus not suited to looking for a linebreak at an unknown offset from the end.
Michael Borgwardt
Well...I tried :P
rball
You can use mark() to get around that problem, depending on what the streams markLimit() is.
James Schek
+2  A: 

Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.

Michael Borgwardt
Note, RandomAccessFile's performance sucks for individual operations - so do sensible size reads into a buffer.
Tom Hawtin - tackline