tags:

views:

78

answers:

4

Hi, I've tried the code below on both Windows (64bit) and Linux(32bit).

I was sure that without BufferedOutputStream the code is bound to throw OutOfMemoryException yet it didn't.

Why is that? Who is doing the {caching / buffer / steaming} to disk there?

Can you please describe, if relevant to the answer, the full flow (Java API -> system call) ?

Does this code uses NIO?

/Me confused.

import java.io.DataOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class WriteHugeFileToDisk {
    private static int BYTE = 1;
    private static int KILBYTE = BYTE * 1024;
    private static int MEGABYTE = KILBYTE * 1024;
    private static int GIGABYTE = MEGABYTE * 1024;
    private static long TERABYTE = GIGABYTE * 1024L;

    public static void main(String[] args) throws IOException {
        FileOutputStream fileOutputStream = new FileOutputStream(args[0]);
        DataOutputStream dataOutputStream = new DataOutputStream(fileOutputStream);

        byte[] buffer = new byte[MEGABYTE];
        for(int i = 0; i < buffer.length; i++) {
            buffer[i] = (byte)i;
        }

        for(long l = 0; l < 4000; l++) {
            dataOutputStream.write(buffer);
            ;
        }

    }
}

I've ran this code with Java 6. Using the following invocations:

Windows:

java WriteHugeFileToDisk %TEMP%\hi.txt

Linux:

java WriteHugeFileToDisk /mnt/hi.info

Please note: The code creates 4GB file full of just for the test.

+5  A: 

Why would it throw an OutOfMemoryException? It's just writing to disk. I wouldn't be surprised if FileOutputStream and DataOutputStream had some buffering (I haven't checked) but they're certainly not required to buffer everything you write.

This code isn't using NIO directly, although I wouldn't be surprised if some of the internal stuff did. As for what system calls are involved and when - that will be implementation specific, but the important thing is to realise that neither DataOutputStream nor FileOutputStream are meant to buffer everything. You write some data to them, and some of that data may get written to disk. If you flush or close the stream, that should make all the data you've written so far get to the disk. If you don't flush or close the stream, I'd expect only a reasonably small amount (again, implementation-specific) to be cached, if any.

Note that BufferedOutputStream does introduce caching - but only as much as you ask for (or a default). Again, it wouldn't buffer everything unless you asked for as much buffer as you write in terms of data.

Jon Skeet
Good answer, and I'd just like to add that the point of buffering is performance. If every byte you write ends up as a write operation to disk, that will be slow. If you buffer the equivalent of one block on disk, that will be a huge improvement. If you'd start buffering gigabytes of data, you wouldn't see much improvements for each added megabyte of buffer size any more.
Mattias Nilsson
@Mattias: True. And of course if a single write call is *already* writing a megabyte, there's not a lot of point in buffering any of that.
Jon Skeet
+1  A: 

A buffered stream is a stream wrapper that (quite obviously) buffers data into memory before passing it to the underlying stream. This gives you better performances when used in conjunction with a file stream because there's a lot of overhead involved in reading or writing to a hard drive. Buffering allows you to significantly reduce the number of reads/writes by collapsing otherwise inefficient multiple reads or writes into a single, efficient, bigger one. However, it is not critical to the well-behaving of your application. It just helps you do less accesses to the physical devices.

Java doesn't have more direct access to your computer's devices than other languages. Between your program and the bits on your hard disk, there still are several layers that are entitled to buffer or cache whatever Java desperately tries to get from or to the disk. As far as I know, the OS can (and usually will) cache or buffer stuff, and some hardware will do it too.

Buffering, in the Java meaning of the operation, has nothing to do with the success or failure of reads or writes to devices, or for that matter, to any stream.

zneak
A: 

Those two instructions consume almost no memory and open a file handle.

FileOutputStream fileOutputStream = new FileOutputStream(args[0]);
DataOutputStream dataOutputStream = new DataOutputStream(fileOutputStream);

Allocate and fill with 1MB of data a byte array which is stored in memory.

byte[] buffer = new byte[MEGABYTE];
for(int i = 0; i < buffer.length; i++) {
    buffer[i] = (byte)i;
}

Write to the output file 4000 times this 1MB of data.

for(long l = 0; l < 4000; l++) {
    dataOutputStream.write(buffer);
}

Conclusion : 1MB of memory is consumed and 4GB of data written to a file. So unless you have very little memory this cannot throw OutOfMemoryException.

Darin Dimitrov
+1  A: 

Who is doing the {caching / buffer / steaming} to disk there?

Nobody. It is writing directly to the disk. No incremental memory usage whatsoever.

EJP