tags:

views:

1933

answers:

5

I'm updating some old code to grab some binary data from a URL instead of from a database (the data is about to be moved out of the database and will be accessible by HTTP instead). The database API seemed to provide the data as a raw byte array directly, and the code in question wrote this array to a file using a BufferedOutputStream.

I'm not at all familiar with Java, but a bit of googling led me to this code:

URL u = new URL("my-url-string");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
    out.write(buffer, 0, bytesRead);
}
in.close();
fileBytes = out.toByteArray();

That seems to work most of the time, but I have a problem when the data being copied is large - I'm getting an OutOfMemoryError for data items that worked fine with the old code.

I'm guessing that's because this version of the code has multiple copies of the data in memory at the same time, whereas the original code didn't.

Is there a simple way to grab binary data from a URL and save it in a file without incurring the cost of multiple copies in memory?

+6  A: 

Instead of writing the data to a byte array and then dumping it to a file, you can directly write it to a file by replacing the following:

ByteArrayOutputStream out = new ByteArrayOutputStream();

With:

FileOutputStream out = new FileOutputStream("filename");

If you do so, there is no need for the call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when done, like this:

out.close();

See the documentation of FileOutputStream for more details.

Ayman Hourieh
Yes, d'oh. I now realise that I've asked entirely the wrong question here. The only reason (which I completely forgot) I copied the data to an array was to find its length. It's a long story, but the subsequent file-writing code needs the data length before it can create the file.Anyway, accepting your answer ... it does what I asked :)
Luke Halliwell
@Luke: Then I say fix the file writing code - it sounds impaired.
Software Monkey
You might want to try using URLConnection.getContentLength() to find out the data length instead of buffering.
laz
@Software Monkey - yes, it's a bit impaired, but I'd prefer to touch as little code as possible here! :)
Luke Halliwell
A: 

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.

But of course, if all you want to do is to store de data into a file, you are better off using a FileOutputStream.

Maurice Perry
A: 

I don't know what you mean with "large" data, but try using the JVM parameter

java -Xmx 256m ...

which sets the maximum heap size to 256 MByte (or any value you like).

micro
Probably not a good strategy, what if he is trying to transfer 1 terabyte?
Nash0
+1  A: 

If you need the Content-Length and your web-server is somewhat standard conforming, then it should provide you a "Content-Length" header.

URLConnection#getContentLength() should give you that information upfront so that you are able to create your file. (Be aware that if your HTTP server is misconfigured or under control of an evil entity, that header may not match the number of bytes received. In that case, why dont you stream to a temp-file first and copy that file later?)

In addition to that: A ByteArrayInputStream is a horrible memory allocator. It always doubles the buffer size, so if you read a 32MB + 1 byte file, then you end up with a 64MB buffer. It might be better to implement a own, smarter byte-array-stream, like this one:

http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

Thomas Morgner
A: 

Thanks very mush ^^

camovie