views:

360

answers:

5

Hello!

I've got a big file on which I'm opening a FileInputStream. This file contains some files each having an offset from the beginning and a size. Furthermore I've got a parser that should evaluate such a contained file.

File file = ...; // the big file
long offset = 1734; // a contained file's offset
long size = 256; // a contained file's size
FileInputStream fis = new FileInputStream(file );
fis.skip(offset);
parse(fis, size);

public void parse(InputStream is, long size) {
   // parse stream data and insure we don't read more than size bytes
   is.close();
}

I feel like this is no good practice. Is there a better way to do this, maybe using buffering?

Furthermore I feel like the skip() method slows the reading process a lot.

Thanks in advance! :-)

+2  A: 

This sounds like a typical nested file aka "zip" file problem.

A common way to handle this is to actually have a separate InputStream instance for each nested logical stream. These would perform the necessary operations on the underlying phsycial stream, and buffering can be both on the underlying stream and the logical stream, depending on which suits best. This means the logical stream encapsulates all the information about placement in the underlying stream.

You could forinstance have a kind of factory method that would have a signature like this:

List<InputStream> getStreams(File inputFile)

You could do the same with OutputStreams.

There are some details to this, but this may be enough for you ?

krosenvold
A: 

You could use a wrapper class on a RandomAccessFile - try this

You could also try wrapping that in a BufferedInputStream and see if the performance improves.

frankodwyer
+4  A: 

It sounds like what you really want is a sort of "partial" input stream - one a bit like the ZipInputStream, where you've got a stream within a stream.

You could write this yourself, proxying all InputStream methods to the original input stream making suitable adjustments for offset and checking for reading past the end of the subfile.

Is that the sort of thing you're talking about?

Jon Skeet
Tried to subclass FileInputStream to create a custom InputStream for my case. It seems like it the FIS looks for an EOF symbol which actually isn't there.I have checked it into SVN:http://code.google.com/p/mtmx/source/browse/code/core/trunk/mtmx.file/src/mtmx/file/internal/SubFileInputStream.java
Stefan Teitge
I wouldn't subclass FileInputStream - I'd subclass just InputStream, so you can create a partial stream from *any* input stream.
Jon Skeet
Changed it, I proxied the FileInputStream and now it works. It checked it in if you want to have a look. Thanks to all of you.
Stefan Teitge
Why restrict it to FileInputStream at all though? Proxy InputStream itself. Also your read(byte[] b) is incorrect at the moment - available -= b.length; should be available -= read;
Jon Skeet
Subclass java.io.FilterInputStream, not InputStream.
erickson
@erickson: Good call.
Jon Skeet
+3  A: 

First, FileInputStream.skip() has a bug which may make the file underneath skip beyond the EOF marker of the file so be wary of that one.

I've personally found working with Input/OutputStreams to be a pain compared to using FileReader and FileWriter and you're showing the main issue I have with them: The need to close the streams after using. One of the issues is that you can never be sure if you've closed up all the resources properly unless you make the code a bit too cautious like this:

public void parse(File in, long size) {
    try {
     FileInputStream fis = new FileInputStream(in);
     // do file content handling here
    } finally {
     fis.close();
    }
    // do parsing here
}

This is of course bad in the sense that this would lead to creating new objects all the time which may end up eating a lot of resources. The good side of this is of course that the stream will get closed even if the file handling code throws an exception.

Esko
I don't understand your comment about FileInputStream vs FileReader. In both cases you need to close the resource after finishing with it. The only difference is that a stream deals with binary data and a Reader deals with text.
Jon Skeet
FileReader/FileWriter are just convenience layers over FileInputStream/FileOutputStream where the conversion from octets (bytes) to characters is done for you. I don't even think you get to choose the type of conversion!
Adrian Pronk
Jon: It's a small convenience but it's still a lot better than messing around with streams directly. As I said, there's several issues I don't like in Java's stream IO, like system-dependant line change which fortunately can be countered with BufferedWriter.nextLine() and BufferedReader.readLine().
Esko
A: 

In general, the code that opens the file should close the file -- the parse() function should not close the input stream, since it is of the utmost arrogance for it to assume that the rest of the program won't want to continue reading other files contained in the big one.

You should decide whether the interface to parse() should be just stream and length (with the function able to assume that the file is correctly positioned) or whether the interface should include the offset (so the function first positions and then reads). Both designs are feasible. I'd be inclined to let the parse() do the positioning, but it is not a clear-cut decision.

Jonathan Leffler