views:

91

answers:

2

It seems dirty to use an exception to indicate the end of a file has been reached. Every file we read has an end, so it doesn't seem exceptional or unexpected. Furthermore, I don't like using an exception for the non-exceptional flow of my program.

I'm talking about using java.io.EOFException to signal the end of a data input stream:

Imagine a file consisting of the following messages...

----------------- ------------------
- 2-byte LENGTH - - N-byte PAYLOAD - , where N = LENGTH;
----------------- ------------------

...and reading this file with DataInputStream:

DataInputStream in = new DataInputStream(...);

...

try {
    while (true) {
        short length = in.readShort();
        byte[] b = new byte[length];
        in.readFully(b);
    }
} catch (EOFException e) { }

...

In this example, an EOFException is thrown by the call to in.readShort(). Should I figure out the number of bytes in the file, and read exactly that number of bytes (determined by total -= length going to zero), and exit the while-loop without an exception? I'm kind of looking for best practice.

Should I do something like this?

long total = file.length();
while (total > 0) {
    short length = in.readShort();
    total -= length;
    byte[] b = new byte[length];
    in.readFully(b);
}

The API Specification specifies that EOFException signals an end of file or end of stream has been reached unexpectedly during input. But it's also used by data input streams to signal end of stream.

What do I do when the excepted is expected?

+1  A: 

Refer to the API specification for the DataInput.readFully method:

 This method blocks until one of the following conditions occurs:

    * b.length bytes of input data are available, in which case a normal return is made.
    * End of file is detected, in which case an EOFException is thrown.
    * An I/O error occurs, in which case an IOException other than EOFException is thrown.

So the idea is that it's either going to read b.length bytes of data, or you get an error if it cannot do that, either because of I/O error or end of file is reached before b.length bytes can be read.

So you are expected to know how many bytes you want to read before calling DataInput.readFully. If you go past the end of the file, that is considered abnormal behavior, and hence, is the reason you get an exception.

dcp
In my example, the EOFException is thrown by `in.readShort();`. Should I do something else, like figure out the file's size and read exactly that number of bytes?
Mr. White
readShort works the same way. http://java.sun.com/j2se/1.4.2/docs/api/java/io/DataInput.html#readShort%28%29 That is, you are expected know that a short is going to be available to read from the file, and if you try to read from the file and a short isn't there, you're going to get an exception. To avoid these types of issues, you could always use readAllBytes (http://openjdk.java.net/projects/nio/javadoc/java/io/Inputs.html#readAllBytes%28java.nio.file.FileRef%29) You could also use readAllLines if you want the file data as strings.
dcp
Since I don't expect to readShort on the last call (when the file is completely read), what should I do?
Mr. White
If you are so opposed to using the exception, you could check the length of the file in advance, keep track of how many bytes you've read, and use that as the exit condition for your while loop. But of course you'd still need to catch the EOFException in case readFully() throws it (if the input file is somehow corrupted), so what would be the point?
David Gelhar
Without knowing your file structure it's hard for me to say. Are you reading a fixed format file (i.e. field1 is 10 bytes, field2 is 20 bytes, field3 is 14 bytes, etc)? Are you reading a CSV file? It really depends on your input. If you are just reading a file full of numbers and you don't know how many there are, then it's going to be difficult to avoid the exception unless you read all the file text first and then work off of the strings.
dcp
@dcp file format is described in the question: records consisting of a 2-byte length followed by N bytes of data
David Gelhar
@DAvid Gelhar - Yes, that wasn't there originally (I think), thanks for pointing it out.
dcp
A: 

As its names suggests, readFully is used when you expect to read fully some bunch of bytes, of known length. If EOF is reached before, that's unexpected, and then it's conceptually right to throw an exception. If you want the read semantic, then use the read method.

leonbloy
Then what should I do in my example?
Mr. White