views:

1380

answers:

5

I have an InputStream of a file and i use apache poi components to read from it like this:

POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);

The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.

What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?

EDIT 1:

By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?

EDIT 2:

Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application. First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.

Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.

A: 

If the file is not that big, read it into a byte[] array and give POI a ByteArrayInputStream created from that array.

If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.

[EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use int read() since it reads the file byte by byte which is very slow!

If you want to do it yourself, use a File object to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len) can read less than len bytes (and usually does).

Aaron Digulla
the Read() method returns int, how do i split the bytes: little or big endian ?
Azder
read returns always 0-255 or -1. Check first for -1(end of stream) and then you can cast it safety to byte.
adrian.tarau
+1  A: 

What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.

Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.

Michael Borgwardt
Not very wise to presume that the input stream is resetable indeed.
adrian.tarau
+4  A: 

you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():

class ResetOnCloseInputStream extends InputStream {

    private final InputStream decorated;

    public ResetOnCloseInputStream(InputStream anInputStream) {
        if (!anInputStream.markSupported()) {
            throw new IllegalArgumentException("marking not supported");
        }

        anInputStream.mark( 1 << 24); // magic constant: BEWARE
        decorated = anInputStream;
    }

    @Override
    public void close() throws IOException {
        decorated.reset();
    }

    @Override
    public int read() throws IOException {
        return decorated.read();
    }
}

testcase

static void closeAfterInputStreamIsConsumed(InputStream is)
        throws IOException {
    int r;

    while ((r = is.read()) != -1) {
        System.out.println(r);
    }

    is.close();
    System.out.println("=========");

}

public static void main(String[] args) throws IOException {
    InputStream is = new ByteArrayInputStream("sample".getBytes());
    ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
    closeAfterInputStreamIsConsumed(decoratedIs);
    closeAfterInputStreamIsConsumed(decoratedIs);
    closeAfterInputStreamIsConsumed(is);
}

EDIT 2

you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream

dfa
How big files does it handle while using the magic constant inanInputStream.mark( 1 << 24) ?
Azder
forget about it, you can make it a parameter
dfa
I just put Integer.MAX_VALUE , anyway thanks it worked like a charm.
Azder
A: 

This is how I would implemented, to be safely used with any InputStream :

  • write your own InputStream wrapper where you create a temporary file to mirror the original stream content
  • dump everything read from the original input stream into this temporary file
  • when the stream was completely read you will have all the data mirrored in the temporary file
  • use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
  • from now on you will loose the reference of the original stream(can be collected)
  • add a new method release() which will remove the temporary file and release any open stream.
  • you can even call release() from finalize to be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?
adrian.tarau
+1  A: 

Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:

public class UnclosableBufferedInputStream extends BufferedInputStream {

    public UnclosableBufferedInputStream(InputStream in) {
     super(in);
     super.mark(Integer.MAX_VALUE);
    }

    @Override
    public void close() throws IOException {
     super.reset();
    }
}

So:

UnclosableBufferedInputStream  bis = new UnclosableBufferedInputStream (inputStream);

and use bis wherever inputStream was used before.

Taveren
Please check EDIT2 of the question:"... the InputStream i get ... doesn't support markings thus cannot reset..."
Azder
It doesn't matter whether your InputStream supports it or not. BufferedInputStream wraps around another stream, buffers the input, and supports marking on its own. The overridden close method, will also conveniently reset it, whenever it's consumed.
Taveren
good point, thanks
Azder