tags:

views:

379

answers:

4

What is the fastest way to find out how many non-empty lines are in a file, using Java?

+2  A: 

The easiest would be with a scanner (yes I like verbose code... you can make it physically shorter). Scanner() also takes File, Reader, etc... so you can pass it whatever you have.

import java.util.Scanner;


public class Main
{
    public static void main(final String[] argv)
    {
        final Scanner scanner;
        final int     lines;

        scanner = new Scanner("Hello\n\n\nEvil\n\nWorld");
        lines   = countLines(scanner);
        System.out.println("lines = "  + lines);
    }

    private static int countLines(final Scanner scanner)
    {
        int lines;

        lines = 0;

        while(scanner.hasNextLine())
        {
            final String line;

            line = scanner.nextLine();

            if(line.length() > 0)
            {
                lines++;
            }
        }

        return lines;
    }
}
TofuBeer
For the down voter, since you didn't say why you voted it down I'll guess: "fastest" has two meanings... fastest execution and fastest to develop. I qualified it with the easiest to develop in case that is what was meant by "fastest". Be nice to know why the down vote if it was for another reason.
TofuBeer
+6  A: 

The easiest way would be to use BufferedReader, and check which lines are empty. However, this is a relatively slow way, because it needs to create a String object for every line in the file. A faster way would be to read the file into arrays using read(), and then iterate through the arrays to count for line breaks.

Here's the code for the two options; the second one took about 50% of the time on my machine.

public static void timeBufferedReader () throws IOException
{
    long bef = System.currentTimeMillis ();

    // The reader buffer size is the same as the array size I use in the other function
    BufferedReader reader = new BufferedReader(new FileReader("test.txt"), 1024 * 10);
    int counter = 0;
    while (reader.ready())
    {
        if (reader.readLine().length() > 0)
            counter++;
    }

    long after = System.currentTimeMillis() - bef;

    System.out.println("Time: " + after + " Result: " + counter);

}

public static void timeFileReader () throws IOException
{
    long bef = System.currentTimeMillis();

    FileReader reader = new FileReader("test.txt");
    char[] buf = new char[1024 * 10];
    boolean emptyLine = true;
    int     counter = 0;
    while (reader.ready())
    {
        int len = reader.read(buf,0,buf.length);
        for (int i = 0; i < len; i++)
        {
            if (buf[i] == '\r' || buf[i] == '\n')
            {
                if (!emptyLine)
                {
                    counter += 1;
                    emptyLine = true;
                }
            }
            else emptyLine = false;
        }
    }

    long after = System.currentTimeMillis() - bef;

    System.out.println("Time: " + after + " Result: " + counter);

}
Daphna Shezaf
+2  A: 

If it really must be the fastest possible, you should look into NIO. And then, test your code on your target platform to see if it's really and truly better using NIO. I was able to get an order of magnitude improvement in some code I was playing with for the Netflix Prize. It involved parsing thousands of files into a more compact, quick-loading binary format. NIO was a big help on my (slow) development laptop.

Limbic System
+5  A: 

I am with Limbic System on the NIO recommendation. I've added a NIO method to Daphna's test code and bench marked it against his two methods:

public static void timeNioReader () throws IOException {
    long bef = System.currentTimeMillis();

    File file = new File("/Users/stu/test.txt");
    FileChannel fc = (new FileInputStream(file)).getChannel(); 
    MappedByteBuffer buf = fc.map(MapMode.READ_ONLY, 0, file.length());
    boolean emptyLine = true;
    int     counter = 0;

    while (buf.hasRemaining())
    {
        byte element = buf.get();

        if (element == '\r' || element == '\n') {
            if (!emptyLine) {
                counter += 1;
                emptyLine = true;
            }
        } else 
            emptyLine = false;

    }

    long after = System.currentTimeMillis() - bef;

    System.out.println("timeNioReader      Time: " + after + " Result: " + counter);

}

Here are the warmed up results for a 89MB file:

timeBufferedReader Time: 947 Result: 747656
timeFileReader     Time: 670 Result: 747656
timeNioReader      Time: 251 Result: 747656

NIO is 2.5x faster than FileReader and 4x fastser than the BufferedReader!

With a 6.4MB file the results are even better, although the warm up time is much longer.

//jvm start, warming up
timeBufferedReader Time: 121 Result: 53404
timeFileReader     Time: 65 Result: 53404
timeNioReader      Time: 40 Result: 53404

//still warming up
timeBufferedReader Time: 107 Result: 53404
timeFileReader     Time: 60 Result: 53404
timeNioReader      Time: 20 Result: 53404

//ripping along
timeBufferedReader Time: 79 Result: 53404
timeFileReader     Time: 56 Result: 53404
timeNioReader      Time: 16 Result: 53404

Make of it what you will.

Stu Thompson