tags:

views:

714

answers:

11

I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?

+4  A: 

Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.

Esko Luontola
+13  A: 
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();

Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l (UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.

Mnementh
What effeciency do you mean? Performance? In that case you will have no better way, because lines can have different lengths you will have to read the complete file, to count the line-numbers (wc does it too). If you speak about programming efficiency than I'm sure you can put it in a utility-method (or some common library did it already).
Mnementh
@Firstthumb. Not efficient maybe, but who cares. He's only counting 20k lines which is pretty small. This code gets my vote for being the simplest.
Chris Dail
how about the efficiency of LineNumberReader since it extends BufferedReader?
Narayan
Nobody says this is better than the LineNumberReader, at least I don't do it.
Mnementh
next question? why don't you do it :D
Narayan
I was somewhat sure, that the BufferedReader will work at least as fast as a FileReader and inspecting every single character. I proved that through measuring the time (and actually showed that inspecting every char is far slower). But I think the LineNumberReader-solution will work as good as the one with the BufferedReader. That's why I upvoted that answer.
Mnementh
A: 

Read the file line by line and increment a counter for each line until you have read the entire file.

Ken Liu
+6  A: 

use LineNumberReader

something like

static public int getLines(File aFile) throws IOException {
     LineNumberReader reader =null;
  try {
         reader = new LineNumberReader( new FileReader(aFile));
       while (( reader.readLine()) != null);
                     return reader.getLineNumber();
         } catch (Exception ex) {
           return -1;
           }
         finally{ 
          if(reader !=null) 
         reader.close();
         }
        }
Narayan
You'd probably also need to close() the reader.
crosstalk
yup;done thanks :D
Narayan
you'd probably to check for reader!=null in the finally block
dfa
@dfa thanks, fixed
Narayan
A: 

The buffered reader is overkill

Reader r = new FileReader("f.txt");

int count = 0;
int nextchar = 0;
while (nextchar != -1){
        nextchar = r.read();
  if (nextchar == Character.getNumericValue('\n') ){
   count++;
  }
 }

My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.

NSherwin
The BufferedReader handles different line-endings well. Your solution ignore Mac-line-endings ('\r'). That may be OK. Anyways, your solution doesn't actual read from the file in the moment. I think you forgot a line.
Mnementh
What's going to change nextchar here? If you're going to call read() on every iteration, I strongly suspect that a BufferedReader approach will be *much* faster...
Jon Skeet
that was the idea ;-/ I wanted to write the simplest possible example. I wonder what the speed difference would be?
NSherwin
BufferedReader is not overkill here. The code in this answer will be hideously slow - FileReader.read() will pull one character at a time from the file.
skaffman
And the answer is 'Dramatic' examples given here http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/
NSherwin
I measured it on my box, Jon Skeet is right, the difference is big. I added the measurements in my answer.
Mnementh
+2  A: 

All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.

I'm sorry, but you have no choice. :-)

Malax
+2  A: 

If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.

For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.

blackNBUK
+1 this might be a suitable online algorithm.
zeroin23
A: 

Probably the fastest solution in pure Java would be to read the file as bytes using a NIO Channel into large ByteBuffer. Then using your knowledge of the file encoding scheme(s) count the encoded CR and/or NL bytes, per the relevant line separator convention.

The keys to maximising throughput will be:

  • make sure that you read the file in large chunks,
  • avoid copying the bytes from one buffer to another,
  • avoid copying / converting bytes into characters, and
  • avoid allocating objects to represent the file lines.

The actual code is too complicated for me to write on the fly. Besides, the OP is not asking for the fastest solution.

Stephen C
+1  A: 

Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.

Daniel
A: 

This is about as efficient as it can get, buffered binary read, no string conversion,

FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
    for (int i = 0; i < n; i++) {
     if (buffer[i] == '\n') count++;
    }
}
stream.close();
System.out.println("Number of lines: " + count);
ZZ Coder
A: 

Quick and dirty, but it does the job:

import java.io.*;

public class Counter {

    public final static void main(String[] args) throws IOException {
        if (args.length > 0) {
            File file = new File(args[0]);
            System.out.println(countLines(file));
        }
    }

    public final static int countLines(File file) throws IOException {
        ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
        Process process = builder.start();
        InputStream in = process.getInputStream();
        LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
        String line = reader.readLine();
        if (line != null) {
            return Integer.parseInt(line.trim().split(" ")[0]);
        } else {
            return -1;
        }
    }

}
Wilfred Springer