What is the fastest way to find out how many non-empty lines are in a file, using Java?
views:
379answers:
4The easiest would be with a scanner (yes I like verbose code... you can make it physically shorter). Scanner() also takes File, Reader, etc... so you can pass it whatever you have.
import java.util.Scanner;
public class Main
{
public static void main(final String[] argv)
{
final Scanner scanner;
final int lines;
scanner = new Scanner("Hello\n\n\nEvil\n\nWorld");
lines = countLines(scanner);
System.out.println("lines = " + lines);
}
private static int countLines(final Scanner scanner)
{
int lines;
lines = 0;
while(scanner.hasNextLine())
{
final String line;
line = scanner.nextLine();
if(line.length() > 0)
{
lines++;
}
}
return lines;
}
}
The easiest way would be to use BufferedReader, and check which lines are empty. However, this is a relatively slow way, because it needs to create a String object for every line in the file. A faster way would be to read the file into arrays using read(), and then iterate through the arrays to count for line breaks.
Here's the code for the two options; the second one took about 50% of the time on my machine.
public static void timeBufferedReader () throws IOException
{
long bef = System.currentTimeMillis ();
// The reader buffer size is the same as the array size I use in the other function
BufferedReader reader = new BufferedReader(new FileReader("test.txt"), 1024 * 10);
int counter = 0;
while (reader.ready())
{
if (reader.readLine().length() > 0)
counter++;
}
long after = System.currentTimeMillis() - bef;
System.out.println("Time: " + after + " Result: " + counter);
}
public static void timeFileReader () throws IOException
{
long bef = System.currentTimeMillis();
FileReader reader = new FileReader("test.txt");
char[] buf = new char[1024 * 10];
boolean emptyLine = true;
int counter = 0;
while (reader.ready())
{
int len = reader.read(buf,0,buf.length);
for (int i = 0; i < len; i++)
{
if (buf[i] == '\r' || buf[i] == '\n')
{
if (!emptyLine)
{
counter += 1;
emptyLine = true;
}
}
else emptyLine = false;
}
}
long after = System.currentTimeMillis() - bef;
System.out.println("Time: " + after + " Result: " + counter);
}
If it really must be the fastest possible, you should look into NIO. And then, test your code on your target platform to see if it's really and truly better using NIO. I was able to get an order of magnitude improvement in some code I was playing with for the Netflix Prize. It involved parsing thousands of files into a more compact, quick-loading binary format. NIO was a big help on my (slow) development laptop.
I am with Limbic System on the NIO recommendation. I've added a NIO method to Daphna's test code and bench marked it against his two methods:
public static void timeNioReader () throws IOException {
long bef = System.currentTimeMillis();
File file = new File("/Users/stu/test.txt");
FileChannel fc = (new FileInputStream(file)).getChannel();
MappedByteBuffer buf = fc.map(MapMode.READ_ONLY, 0, file.length());
boolean emptyLine = true;
int counter = 0;
while (buf.hasRemaining())
{
byte element = buf.get();
if (element == '\r' || element == '\n') {
if (!emptyLine) {
counter += 1;
emptyLine = true;
}
} else
emptyLine = false;
}
long after = System.currentTimeMillis() - bef;
System.out.println("timeNioReader Time: " + after + " Result: " + counter);
}
Here are the warmed up results for a 89MB file:
timeBufferedReader Time: 947 Result: 747656
timeFileReader Time: 670 Result: 747656
timeNioReader Time: 251 Result: 747656
NIO is 2.5x faster than FileReader and 4x fastser than the BufferedReader!
With a 6.4MB file the results are even better, although the warm up time is much longer.
//jvm start, warming up
timeBufferedReader Time: 121 Result: 53404
timeFileReader Time: 65 Result: 53404
timeNioReader Time: 40 Result: 53404
//still warming up
timeBufferedReader Time: 107 Result: 53404
timeFileReader Time: 60 Result: 53404
timeNioReader Time: 20 Result: 53404
//ripping along
timeBufferedReader Time: 79 Result: 53404
timeFileReader Time: 56 Result: 53404
timeNioReader Time: 16 Result: 53404
Make of it what you will.