views:

231

answers:

4

I need to write blocks of data (characters) and I don't care about the sequence of those blocks. I wonder what kind of OutputStream I should use to achieve high performance?

A: 

You can make a wrapper class - i.e. extend ZipOutputStream and make methods synchronized by overriding them. For example:

class MyZip extends java.util.zip.ZipOutputStream {

  synchronized public void write(byte[] b, int off, int len) throws IOException {
    super.write( b, off, len);
  }

}

Alternatively, you can use already existing solution de.schlichtherle.util.zip.ZipOutputStream

Update: I assume one thread will open the stream and entry, then many will write, and then one thread will close it.

Viliam
@Viliam: your first solution won't work. You need to do the putNextEntry/write(s)/closeEntry in a single synchronized call.
Stephen C
@Stephen C: No. Romam needs to write chunks of characters asynchronously. Not multiple files. Putting putNextEntry/write(s)/closeEntry into a single sync block would ruin that completely.
Viliam
@Viliam: OK ... so that is a plausible interpretation of the question. But I really don't think that is what he/she means. Why would he/she be using a ZIP stream rather than (say) a plain stream or a stream with a better compression algorithm?
Stephen C
@Stephen C: That's because the question has been changed! :) The first version asks explicitly to use ZipOutputStream, this one (edited after the answer I post) is only about OutputStream. That's live I guess...
Viliam
+3  A: 

Simply calling a vanilla ZipOutputStream from multiple threads would not work. The ZipOutputStream API has a model where you write entries one at a time as follows:

ZipOutputStream zos = ...

while (...) {
    zos.putNextEntry(...);
    while (...) {
       zos.write(...);
    }
    zos.closeEntry();
}

This model is inherently non-thread-safe.

In order to do this in a thread-safe fashion, you'd need to wrap the ZipOutputStream in a class that does the put/write/close operations in one synchronized method call. And that means that you are essentially doing your Zip output operations serially, which largely defeats your purpose for doing this.

Stephen C
You would probably have some sort of class in front of the `ZipOutputStream` which would either synchronise or, for potentially better performance, queue file entries on to a dedicated (per-zip) thread.
Tom Hawtin - tackline
@Tom: that's what I said didn't I? (I can see that a dedicated thread might give better throughput ... if the application is I/O bound, but you'd get the same effect by increasing the number of worker threads creating the stuff to be written.)
Stephen C
@Stephen: No. You said one synchronized method call. Tom pointed out it is possible to queue file entries. OP doesn't ask for high-performance, just for concurrency-compatibility.
Jason S
That solution has terrible performance impact. As I understand it, Romam wants to write chunks of characters in one stream, not multiple entries...
Viliam
@Viliam: it is not possible to know what the OP really wants. I suppose it is possible that he/she sants to write multiple chunks into a single ZIP file entry. But then the answer is the same as for any output stream or writer ... and why would he/she bother with a ZIP stream rather than a plain stream or a stream with better compression.
Stephen C
@Jason: I don't understand your point. Tom's queue-based proposal is intended to improve performance ... he says exactly that!
Stephen C
+2  A: 

I came across a similar situation when doing parallel image compression algorithms. You can create lots of memory chunks as output streams and save the zipped data to these and concatenate them later. Other parallel compression algorithms such as ECW do the same with saving the compressed chunks to files instead, at the end of the compression a collation task joins all the chunks together.

whatnick
+1  A: 

java.io.BufferedOutputStream with a big buffer is likely your best best for most situations, but that actually uses bytes (use BufferedWriter is you want to write Characters/Strings).

Note: the absolute performance of this will depend on heap size, OS, Garbage Collector, Buffer Size, phase of the moon, etc... but will generally be better than writing everything byte by byte.

As you say you don't about the sequence, I'm probably misunderstanding what you really want to do as these obviously operate on things in a sequential manner.

Mainguy