I need to write blocks of data (characters) and I don't care about the sequence of those blocks. I wonder what kind of OutputStream I should use to achieve high performance?
views:
231answers:
4You can make a wrapper class - i.e. extend ZipOutputStream and make methods synchronized by overriding them. For example:
class MyZip extends java.util.zip.ZipOutputStream {
synchronized public void write(byte[] b, int off, int len) throws IOException {
super.write( b, off, len);
}
}
Alternatively, you can use already existing solution de.schlichtherle.util.zip.ZipOutputStream
Update: I assume one thread will open the stream and entry, then many will write, and then one thread will close it.
Simply calling a vanilla ZipOutputStream from multiple threads would not work. The ZipOutputStream API has a model where you write entries one at a time as follows:
ZipOutputStream zos = ...
while (...) {
zos.putNextEntry(...);
while (...) {
zos.write(...);
}
zos.closeEntry();
}
This model is inherently non-thread-safe.
In order to do this in a thread-safe fashion, you'd need to wrap the ZipOutputStream in a class that does the put/write/close operations in one synchronized method call. And that means that you are essentially doing your Zip output operations serially, which largely defeats your purpose for doing this.
I came across a similar situation when doing parallel image compression algorithms. You can create lots of memory chunks as output streams and save the zipped data to these and concatenate them later. Other parallel compression algorithms such as ECW do the same with saving the compressed chunks to files instead, at the end of the compression a collation task joins all the chunks together.
java.io.BufferedOutputStream with a big buffer is likely your best best for most situations, but that actually uses bytes (use BufferedWriter is you want to write Characters/Strings).
Note: the absolute performance of this will depend on heap size, OS, Garbage Collector, Buffer Size, phase of the moon, etc... but will generally be better than writing everything byte by byte.
As you say you don't about the sequence, I'm probably misunderstanding what you really want to do as these obviously operate on things in a sequential manner.