tags:

views:

1743

answers:

2

Is there a way to read a ByteBuffer with a BufferedReader without having to turn it into a String first? I want to read through a fairly large ByteBuffer as lines of text and for performance reasons I want to avoid writing it to the disk. Calling toString on the ByteBuffer doesn't work because the resulting String is too large (it throws java.lang.OutOfMemoryError: Java heap space). I would have thought there would be something in the API to wrap a ByteBuffer in a suitable reader, but I can't seem to find anything suitable.

Here's an abbreviated code sample the illustrates what I am doing):

// input stream is from Process getInputStream()
public String read(InputStream istream)
{
  ReadableByteChannel source = Channels.newChannel(istream);
  ByteArrayOutputStream ostream = new ByteArrayOutputStream(bufferSize);
  WritableByteChannel destination = Channels.newChannel(ostream);
  ByteBuffer buffer = ByteBuffer.allocateDirect(writeBufferSize);

  while (source.read(buffer) != -1)
  {
    buffer.flip();
    while (buffer.hasRemaining())
    {
      destination.write(buffer);
    }
    buffer.clear();
  }

  // this data can be up to 150 MB.. won't fit in a String.
  result = ostream.toString();
  source.close();
  destination.close();
  return result;
}

// after the process is run, we call this method with the String
public void readLines(String text)
{
  BufferedReader reader = new BufferedReader(new StringReader(text));
  String line;

  while ((line = reader.readLine()) != null)
  {
    // do stuff with line
  }
}
+4  A: 

It's not clear why you're using a byte buffer to start with. If you've got an InputStream and you want to read lines for it, why don't you just use an InputStreamReader wrapped in a BufferedReader? What's the benefit in getting NIO involved?

Calling toString() on a ByteArrayOutputStream sounds like a bad idea to me even if you had the space for it: better to get it as a byte array and wrap it in a ByteArrayInputStream and then an InputStreamReader, if you really have to have a ByteArrayOutputStream. If you really want to call toString(), at least use the overload which takes the name of the character encoding to use - otherwise it'll use the system default, which probably isn't what you want.

EDIT: Okay, so you really want to use NIO. You're still writing to a ByteArrayOutputStream eventually, so you'll end up with a BAOS with the data in it. If you want to avoid making a copy of that data, you'll need to derive from ByteArrayOutputStream, for instance like this:

public class ReadableByteArrayOutputStream extends ByteArrayOutputStream
{
    /**
     * Converts the data in the current stream into a ByteArrayInputStream.
     * The resulting stream wraps the existing byte array directly;
     * further writes to this output stream will result in unpredictable
     * behavior.
     */
    public InputStream toInputStream()
    {
        return new ByteArrayInputStream(array, 0, count);
    }
}

Then you can create the input stream, wrap it in an InputStreamReader, wrap that in a BufferedReader, and you're away.

Jon Skeet
Good question - I would agree and that's what I would do if I had the option. The reason I can't in this case is that I can't do anything with the output from the process (i.e. the InputStream) until the process is finished, so I need to put it in a buffer to read later.
Rob
So put it into a byte array with ByteArrayOutputStream. Once you've got it as a byte array, you're fine. That's effectively what NIO is going to be doing anyway, it's just more straightforward with BAOS. If it's going to be enormous, you *might* want to derive your own ByteArrayOutputStream which gives you direct access to the byte array, so you don't need to worry about creating a copy with toByteArray(). It's a shame ByteArrayOutputStream doesn't have a "toByteArrayInputStream" to let you read directly from it...
Jon Skeet
As for why I'm using NIO: partly because I'm a masochist and am determined to get NIO figured out once and for all (if that is, in fact, humanly possible), and partly because I want be read the InputStream as fast as possible and NIO seems quicker for this sort of thing.
Rob
Okay, if you really, really want to use NIO - editing answer.
Jon Skeet
+2  A: 

You can use NIO, but there's no real need here. As Jon Skeet suggested:

public byte[] read(InputStream istream)
{
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  byte[] buffer = new byte[1024]; // Experiment with this value
  int bytesRead;

  while ((bytesRead = istream.read(buffer)) != -1)
  {
    baos.write(buffer, 0, bytesRead);
  }

  return baos.toByteArray();
}


// after the process is run, we call this method with the String
public void readLines(byte[] data)
{
  BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(data)));
  String line;

  while ((line = reader.readLine()) != null)
  {
    // do stuff with line
  }
}
Matthew Flaschen
Although this is not the answer I accepted (since I wanted to try using NIO), this using standard IO like this turned out to be faster than the NIO approach. Still, it was a good learning experience to try NIO.
Rob