ansaurus

Question

problem using base64 encoder and InputStreamReader

Answer 1

A:

"For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:"

BufferedReader in = new BufferedReader(new InputStreamReader(b64is));

Addendum: As Base64 is padded to a multiple of 4 characters, verify that the source isn't truncated. A flush() may be required.

trashgod 2010-05-30 01:41:58

Perhaps it is more efficient, but it doesn't solve the problem

karoberts 2010-05-30 02:08:49

Any chance your stream is truncated? IIRC, `base64` is framed.

trashgod 2010-05-30 02:25:23

Question updated. Can you elaborate on what you mean by "base64 is framed" ? The stream comes directly from the file.

karoberts 2010-05-30 02:31:44

The encoded stream must be padded to "an integer multiple of 4 characters" in order to decode the last byte; this would be a problem if the stream were truncated. Reference cited above.

trashgod 2010-05-30 02:45:43

@trashgod - *"Reference cited above."*. WHERE?

Stephen C 2010-05-30 04:52:55

@Stephen C: "an integer multiple of 4 characters"—Base64 http://en.wikipedia.org/wiki/Base64

trashgod 2010-05-30 16:14:17

Answer 2

+6 A:

This appears to be a bug in Base64InputStream. You're calling it correctly.

You should report this to the Apache commons codec project.

Simple test case:

import java.io.*;
import org.apache.commons.codec.binary.Base64InputStream;

class tmp {
  public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(args[0]);
    Base64InputStream b64is = new Base64InputStream(fis, true, -1, null);

    while (true) {
      byte[] c = new byte[1024];
      int n = b64is.read(c);
      if (n < 0) break;
      if (n == 0) throw new IOException("returned 0!");
      for (int i = 0; i < n; i++) {
        System.out.print((char)c[i]);
      }
    }
  }
}

the read(byte[]) call of InputStream is not allowed to return 0. It does return 0 on any file which is a multiple of 3 bytes long.

Keith Randall 2010-05-30 03:34:54

Yes, you're right. This is a bug in Base64InputStream. +1 for the testcase which confirms this.

BalusC 2010-05-30 03:46:32

Reported btw: https://issues.apache.org/jira/browse/CODEC-101 That said, I'm still wondering about the coincidence that my test file was indeed a multiple of 3 bytes long :o)

BalusC 2010-05-30 04:08:48

Wow, thanks for confirming that, I must say I'm surprised that I found such a bug (however inadvertently).

karoberts 2010-05-30 05:27:27

Answer 3

+3 A:

Interesting, I did some tests here and it indeed throws that exception when you read the Base64InputStream using an InputStreamReader, regardless the source of the stream, but it works flawlessly when you read it as binary stream. As Trashgod mentioned, Base64 encoding is framed. The InputStreamReader should in fact have invoked flush() on the Base64InputStream once more to see if it doesn't return any more data.

~~I don't see other ways to fix this than implementing your own Base64InputStreamReader or Base64Reader~~. This is actually a bug, see Keith's answer.

As a workaround you can also just store it in a BLOB instead of a CLOB in the DB and use PreparedStatement#setBinaryStream() instead. It doesn't matter if it's stored as binary data or not. You don't want to have such large Base64 data to be indexable or searchable anyway.

Update: since that's not an option and having the Apache Commons Codec guys to fix the Base64InputStream bug which I repored as CODEC-101 might take some time, you may consider to use another 3rd party Base64 API. I've found one here (public domain, so you can do whatever with it you want, even place in your own package), I've tested it here and it works fine.

InputStream base64 = new Base64.InputStream(input, Base64.ENCODE);

Update 2: the commons codec guy has fixed it pretty soon.

Index: src/java/org/apache/commons/codec/binary/Base64InputStream.java
===================================================================
--- src/java/org/apache/commons/codec/binary/Base64InputStream.java (revision 950817)
+++ src/java/org/apache/commons/codec/binary/Base64InputStream.java (working copy)
@@ -145,21 +145,41 @@
         } else if (len == 0) {
             return 0;
         } else {
-            if (!base64.hasData()) {
-                byte[] buf = new byte[doEncode ? 4096 : 8192];
-                int c = in.read(buf);
-                // A little optimization to avoid System.arraycopy()
-                // when possible.
-                if (c > 0 && b.length == len) {
-                    base64.setInitialBuffer(b, offset, len);
+            int readLen = 0;
+            /*
+             Rationale for while-loop on (readLen == 0):
+             -----
+             Base64.readResults() usually returns > 0 or EOF (-1).  In the
+             rare case where it returns 0, we just keep trying.
+
+             This is essentially an undocumented contract for InputStream
+             implementors that want their code to work properly with
+             java.io.InputStreamReader, since the latter hates it when
+             InputStream.read(byte[]) returns a zero.  Unfortunately our
+             readResults() call must return 0 if a large amount of the data
+             being decoded was non-base64, so this while-loop enables proper
+             interop with InputStreamReader for that scenario.
+             -----
+             This is a fix for CODEC-101
+            */
+            while (readLen == 0) {
+                if (!base64.hasData()) {
+                    byte[] buf = new byte[doEncode ? 4096 : 8192];
+                    int c = in.read(buf);
+                    // A little optimization to avoid System.arraycopy()
+                    // when possible.
+                    if (c > 0 && b.length == len) {
+                        base64.setInitialBuffer(b, offset, len);
+                    }
+                    if (doEncode) {
+                        base64.encode(buf, 0, c);
+                    } else {
+                        base64.decode(buf, 0, c);
+                    }
                 }
-                if (doEncode) {
-                    base64.encode(buf, 0, c);
-                } else {
-                    base64.decode(buf, 0, c);
-                }
+                readLen = base64.readResults(b, offset, len);
             }
-            return base64.readResults(b, offset, len);
+            return readLen;
         }
     }

I tried it here and it works fine.

BalusC 2010-05-30 03:43:09

+1 Good workaround.

trashgod 2010-05-30 03:59:45

Unfortunately, I cannot use BLOB because sometimes the data in there will be text

karoberts 2010-05-30 05:33:48

+1 Thanks, that class will work nicely.

karoberts 2010-05-30 16:19:31

ansaurus

tags:

views:

answers:

problem using base64 encoder and InputStreamReader

related questions