I have a text file. I would like to retrieve the content from one line to another line. For example, the file may be 200K lines. I want to read the content from line 78 to line 2735. Since the file may be very large, I do not want to read the whole content into the memory.
A:
Just simply read line by line first and count the line numbers and start getting the contents you need at the line position you mentioned.
khmarbaise
2010-04-26 14:52:33
+9
A:
Use BufferedReader.readLine() and count the lines. You'll keep only the buffer size and the current line in memory.
And no, it's not possible to get to line 3412 without reading the whole file up to that point (unless your lines all have a fixed size).
Michael Borgwardt
2010-04-26 14:52:35
A:
Here's a start of a possible solution:
public static List<String> linesFromTo(int from, int to, String fileName)
throws FileNotFoundException, IllegalArgumentException {
return linesFromTo(from, to, fileName, "UTF-8");
}
public static List<String> linesFromTo(int from, int to, String fileName, String charsetName)
throws FileNotFoundException, IllegalArgumentException {
if(from > to) {
throw new IllegalArgumentException("'from' > 'to'");
}
if(from < 1 || to < 1) {
throw new IllegalArgumentException("'from' or 'to' is negative");
}
List<String> lines = new ArrayList<String>();
Scanner scan = new Scanner(new File(fileName), charsetName);
int lineNumber = 0;
while(scan.hasNextLine() && lineNumber < to) {
lineNumber++;
String line = scan.nextLine();
if(lineNumber < from) continue;
lines.add(line);
}
if(lineNumber != to) {
throw new IllegalArgumentException(fileName+" does not have "+to+" lines");
}
return lines;
}
Bart Kiers
2010-04-26 15:03:28
Replace the `//assume` with `assert` and add a `;` at the end ;-)
Joachim Sauer
2010-04-26 15:05:03
Also, is there a reason to use a Scanner instead of the (arguably) simpler `BufferedReader`? Also: you don't specify the character encoding use to read from the file, so you're leaving that part to luck.
Joachim Sauer
2010-04-26 15:06:06
@Joachim, good points, I edited my answer! About the `BufferedReader`, I assume(d) it doesn't matter that much: am I mistaken?
Bart Kiers
2010-04-26 15:18:43
@Bart: you're right: `BufferedReader`/`Scanner` doesn't matter much (as long as the charset is handled correctly). I just think that `Scanner` has more features than necessary for this use case.
Joachim Sauer
2010-04-26 15:36:24
yes. Scanner is based on regular expression and it fits my use case well. While searching for the solution, I found that apache common-io has lineiterator class. It can be used to iterate the lines.
frank wang
2010-04-28 02:55:00
A:
I would suggest using a RandomAccessFile, this class enables you to jump to a specific location in a file. So if you want to read the last line of the file you don't have to read all of the previous lines you can just jump to that line.
flopex
2010-04-26 15:27:52
That doesn't help, because there is no fixed relation between byte index and line numbers. A line can be as long or as short as you wish, so you'll have to read the data to know when one ends. Reading the last line can be improved this way (unless the file actually contains a single huge line), but generally it won't help.
Joachim Sauer
2010-04-26 15:41:31