views:

5505

answers:

12

I currently have 2 BufferedReaders initialize on the same text file. When I'm done reading the reading the text file with the first BufferedReader, I use the second one to make another pass thru the file from the top. Multiple passes thru the same file are necessary.

I know about reset(), but it needs a previous call to mark() and mark() needs to know the size of the file, something I don't think I should have to bother with.

Ideas? Packages? Libs? Code?

Thanks TJ

A: 

Why read the file twice? Why not read the file once using BufferedReader into some data structure (a String[], for example?), and then process that data structure?

matt b
The usual reason why you don't want to do that is there might be memory issues
Davide
+4  A: 

What's the disadvantage of just creating a new BufferedReader to read from the top? I'd expect the operating system to cache the file if it's small enough.

If you're concerned about performance, have you proved it to be a bottleneck? I'd just do the simplest thing and not worry about it until you have a specific reason to. I mean, you could just read the whole thing into memory and then do the two passes on the result, but again that's going to be more complicated than just reading from the start again with a new reader.

Jon Skeet
A: 

MattB:

Huge text file, would rather not store it.

But I can't argue with the simplicity of your answer. It's certainly worth a try.

Thanks

So if you're not going to store it in memory, why would you particularly want to do anything other than create another reader?
Jon Skeet
What I was trying to get at my answer wasn't so much "read the whole thing into memory" but to have you take a look at your algorithm to see if you really need to read it twice.
matt b
(continued) If you're having problems with BufferedReader, but you _really_ don't need to use it like that, you can save time by not having to solve problems that you don't need to.
matt b
A: 

Jon Skeet:

That's the way I currently have it implemented. It works fine, performance is certainly NOT an issue.

I just felt a tinge of shame from not having a more elegant approach.

Thanks

(It's generally a good idea to hit "Add comment" on the answer you're replying to, rather than adding a new answer.) You're doing exactly what you need to, with no over-engineering. Seems elegant enough to me :)
Jon Skeet
He can't add a comment until he's got 50 rep
Dave
Yep. If you really, truly can't do it all in one pass, I don't think that you should feel bad about leveraging the file system and whatever caching the OS is giving you. Sometimes a format is poorly designed for one-pass processing, and you can't change it (X.509 CRL structures, for example).
erickson
A: 

I agree with Jon Skeet. Why is it so bad to have 2 BufferedReader's ? I guess if the file is too large you could have some gain by keeping it in memory. But in that case, you must choose between memory consumption and some gain in performance.

bruno conde
+7  A: 

The Buffered readers are meant to read a file sequentially. What you are looking for is the java.io.RandomAccessReader, and then you can use seek() to take you to where you want in the file.

The random access reader is implemented like so:

try{
     String fileName = "c:/myraffile.txt";
     File file = new File(fileName);
     RandomAccessFile raf = new RandomAccessFile(file, "rw");
     raf.readChar();
     raf.seek(0);
} catch (FileNotFoundException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
} catch (IOException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
}

The "rw" is a mode character which is detailed here: http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html#mode

The reason the sequential access readers are setup like this is so that they can implement their buffers and that things can not be changed beneath their feet. For example the file reader that is given to the buffered reader should only be operated on by that buffered reader. If there was another location that could affect it you could have inconsistent operation as one reader advanced its position in the file reader while the other wanted it to remain the same now you use the other reader and it is in an undetermined location.

Ryan P
+2  A: 

The best way to proceed is to change your algorithm, in a way in which you will NOT need the second pass. I used this approach a couple of times, when I had to deal with huge (but not terrible, i.e. few GBs) files which didn't fit the available memory.

It might be hard, but the performance gain usually worths the effort

Davide
A: 

Ryan P:

I didn't know about RandomAccessReader, thanks.

It seems (I'm biased I admit) that BufferedReader should have a method "topOfFile" "startOfStream" that does what RandomAccessReader.seek(0) does.

The whole business about mark() and reset() in BufferedReader smacks of poor design.

This may be the last time I ever use BufferedReader.

Thanks again.

A: 

hi TJ,

"The whole business about mark() and reset() in BufferedReader smacks of poor design."

why don't you extend this class and have it do a mark() in the constructor() and then do a seek(0) in topOfFile() method.

BR,
~A

anjanb
A: 

About mark/reset:

The mark method in BufferedReader takes a readAheadLimit parameter which limits how far you can read after a mark before reset becomes impossible. Resetting doesn't actually mean a file system seek(0), it just seeks inside the buffer. To quote the Javadoc:

readAheadLimit - Limit on the number of characters that may be read while still preserving the mark. After reading this many characters, attempting to reset the stream may fail. A limit value larger than the size of the input buffer will cause a new buffer to be allocated whose size is no smaller than limit. Therefore large values should be used with care.

Zarkonnen
A: 

@anjan b:

or I could just post non-constructive answers to other people's questions. Ryan P's suggestion to use RandomAccessReader makes your post moot at best.

@Zarkonnen:

I UNDERSTAND and DISLIKE the mark/reset paradigm. Your post implies I dislike it because I don't get it. Incorrect. I don't believe I should have to write code that's aware of the structure and length of the file it's buffering in order to simply go to an arbitrary point in it.

I should be able to call mark() before I read the nth line/char/String and go back there whenever I please, not if and only if I haven't passed some arbitrary number.

What's worse is the way the behavior exhibited if you incorrectly compute/guess/estimate the readAheadLimit.

Suffice it to say, anyone who like cookie dough will be in for a treat because mark()/reset() is definitely half-baked.

Thanks again to everyone who posted. I enjoyed thinking about and discussing the issue from all angles.

A: 

RandomAccessReader is extremely slow class, do not use it unless perfomance isn't an issue.

Tertium Organum