I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?
Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l
(UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.
Read the file line by line and increment a counter for each line until you have read the entire file.
use LineNumberReader
something like
static public int getLines(File aFile) throws IOException {
LineNumberReader reader =null;
try {
reader = new LineNumberReader( new FileReader(aFile));
while (( reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
}
finally{
if(reader !=null)
reader.close();
}
}
The buffered reader is overkill
Reader r = new FileReader("f.txt");
int count = 0;
int nextchar = 0;
while (nextchar != -1){
nextchar = r.read();
if (nextchar == Character.getNumericValue('\n') ){
count++;
}
}
My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.
All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.
I'm sorry, but you have no choice. :-)
If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.
For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.
Probably the fastest solution in pure Java would be to read the file as bytes using a NIO Channel into large ByteBuffer. Then using your knowledge of the file encoding scheme(s) count the encoded CR and/or NL bytes, per the relevant line separator convention.
The keys to maximising throughput will be:
- make sure that you read the file in large chunks,
- avoid copying the bytes from one buffer to another,
- avoid copying / converting bytes into characters, and
- avoid allocating objects to represent the file lines.
The actual code is too complicated for me to write on the fly. Besides, the OP is not asking for the fastest solution.
Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.
This is about as efficient as it can get, buffered binary read, no string conversion,
FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
for (int i = 0; i < n; i++) {
if (buffer[i] == '\n') count++;
}
}
stream.close();
System.out.println("Number of lines: " + count);
Quick and dirty, but it does the job:
import java.io.*;
public class Counter {
public final static void main(String[] args) throws IOException {
if (args.length > 0) {
File file = new File(args[0]);
System.out.println(countLines(file));
}
}
public final static int countLines(File file) throws IOException {
ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
Process process = builder.start();
InputStream in = process.getInputStream();
LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
String line = reader.readLine();
if (line != null) {
return Integer.parseInt(line.trim().split(" ")[0]);
} else {
return -1;
}
}
}