views:

2896

answers:

4

I have a String that I want to use as an InputStream. In Java 1.0, you could use java.io.StringBufferInputStream, but that has been @Deprecrated (with good reason--you cannot specify the character set encoding):

This class does not properly convert characters into bytes. As of JDK 1.1, the preferred way to create a stream from a string is via the StringReader class.

You can create a java.io.Reader with java.io.StringReader, but there are no adapters to take a Reader and create an InputStream.

I found an ancient bug asking for a suitable replacement, but no such thing exists--as far as I can tell.

The oft-suggested workaround is to use java.lang.String.getBytes() as input to java.io.ByteArrayInputStream:

public InputStream createInputStream(String s, String charset)
    throws java.io.UnsupportedEncodingException {

    return new ByteArrayInputStream(s.getBytes(charset));
}

but that means materializing the entire String in memory as an array of bytes, and defeats the purpose of a stream. In most cases this is not a big deal, but I was looking for something that would preserve the intent of a stream--that as little of the data as possible is (re)materialized in memory.

+1  A: 
Michael Myers
Interesting... of course, with this solution I believe that you would either materialize the whole string in memory, or suffer starvation on the reading thread. Still hoping that there's a real implementation somewhere.
Jared Oberhaus
You have to be careful with Piped(Input|Output)Stream. As per the docs: "...Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread..." http://java.sun.com/j2se/1.4.2/docs/api/java/io/PipedInputStream.html
Bryan Kyle
+1  A: 

A solution is to roll your own, creating an InputStream implementation that likely would use java.nio.charset.CharsetEncoder to encode each char or chunk of chars to an array of bytes for the InputStream as necessary.

Jared Oberhaus
Doing things one character at a time is expensive. That's why we have "chunked iterators" like InputStream that allow us to read a buffer at a time.
Tom Hawtin - tackline
I agree with Tom -- you **really** don't want to do this one character at a time.
Eddie
+2  A: 

See the answer to the StackOverflow question:

How to convert a Reader to InputStream and a Writer to OutputStream?

To make it easier for folks -- only one link to click on -- the applicable link from the selected answer for that question is:

Eddie
That points to http://www.koders.com/java/fid0A51E45C950B2B8BD9365C19F2626DE35EC09090.aspx Perfect!
Jared Oberhaus
FYI: that code has a bug in the way it reads bytes (it will not work for all encodings). Proof: http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_stringclass There is an open bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=40455
McDowell
+1  A: 

To my mind, the easiest way to do this is by pushing the data through a Writer:

public class StringEmitter {
  public static void main(String[] args) throws IOException {
    class DataHandler extends OutputStream {
      @Override
      public void write(final int b) throws IOException {
        write(new byte[] { (byte) b });
      }
      @Override
      public void write(byte[] b) throws IOException {
        write(b, 0, b.length);
      }
      @Override
      public void write(byte[] b, int off, int len)
          throws IOException {
        System.out.println("bytecount=" + len);
      }
    }

    StringBuilder sample = new StringBuilder();
    while (sample.length() < 100 * 1000) {
      sample.append("sample");
    }

    Writer writer = new OutputStreamWriter(
        new DataHandler(), "UTF-16");
    writer.write(sample.toString());
    writer.close();
  }
}

The JVM implementation I'm using pushed data through in 8K chunks, but you could have some affect on the buffer size by reducing the number of characters written at one time and calling flush.


An alternative to writing your own CharsetEncoder wrapper to use a Writer to encode the data, though it is something of a pain to do right. This should be a reliable (if inefficient) implementation:

/** Inefficient string stream implementation */
public class StringInputStream extends InputStream {

  /* # of characters to buffer - must be >=2 to handle surrogate pairs */
  private static final int CHAR_CAP = 8;

  private final Queue<Byte> buffer = new LinkedList<Byte>();
  private final Writer encoder;
  private final String data;
  private int index;

  public StringInputStream(String sequence, Charset charset) {
    data = sequence;
    encoder = new OutputStreamWriter(
        new OutputStreamBuffer(), charset);
  }

  private int buffer() throws IOException {
    if (index >= data.length()) {
      return -1;
    }
    int rlen = index + CHAR_CAP;
    if (rlen > data.length()) {
      rlen = data.length();
    }
    for (; index < rlen; index++) {
      char ch = data.charAt(index);
      encoder.append(ch);
      // ensure data enters buffer
      encoder.flush();
    }
    if (index >= data.length()) {
      encoder.close();
    }
    return buffer.size();
  }

  @Override
  public int read() throws IOException {
    if (buffer.size() == 0) {
      int r = buffer();
      if (r == -1) {
        return -1;
      }
    }
    return 0xFF & buffer.remove();
  }

  private class OutputStreamBuffer extends OutputStream {

    @Override
    public void write(int i) throws IOException {
      byte b = (byte) i;
      buffer.add(b);
    }

  }

}
McDowell