views:

272

answers:

1

A user uploads a large file to my website and I want to gzip the file and store it in a blob. So I have an uncompressed InputStream and the blob wants an InputStream. I know how to compress an InputStream to an Outputstream using GZIPOutputStream, but how do I go from the gzip'ed OutputStream back to the InputStream needed by the blob.

The only way I could find involves using ByteArrayOutputStream and then creating a new InputStream using toByteArray. But that will mean I have an entire copy of the file in memory. And it wouldn't surprise me if the JDBC driver implementation converted the stream to a byte[] also so I'd have two copies in memory.

+2  A: 

If you are on java 1.6 you can use java.util.zip.DeflaterInputStream. As far as I can tell, this does exactly what you want. If you can't use 1.6 you should be able to reimplement DeflaterInputStream using java.util.zip.Deflater. When reading the data back from the BLOB use a InflaterInputStream as a filter to get the original data back.

Geoff Reedy
I wasn't aware of that Class. That looks like the right solution. Unfortunately, the Blob implementation uses the length and DeflaterInputStream always returns 0 or 1. I think the fact that I need the length means I'm not going to be able to compress and stream the data directly into the blob no matter what, since the length can't be known until compression is complete.
Brian Deterling
@Brian So you need to pass a length along with the input stream when creating the blob? There's not length method on InputStream, only an available method which means something totally different from the stream length.
Geoff Reedy
available() does seem to return the correct length on the original input stream (which is coming from an http post). Maybe it's based on the content length or maybe it's actually reading the entire stream somewhere upstream before I get it. But that doesn't help once I compress it, because I won't know the compressed size until I've already processed the entire stream, at which point it's in memory so I may as well convert it to a byte[].
Brian Deterling
At this point you're dealing with a time/space tradeoff. You can bite the bullet and compress to a byte array, using more memory but taking less time. The other option is to create a deflate stream and skip over the whole thing to find out how many bytes the compressed version is, then recreate the deflate stream and pass it to the blob, using less memory but taking more time.
Geoff Reedy