ansaurus

Question

Java: Reading a pdf file from URL into Byte array/ByteBuffer in an applet.

Answer 1

A:

Eddie, Had to put my reply to your comments here, since I can't comment. When I say it's corrupt, I mean it cannot be opened by Adobe Acrobat, and my pdf-renderer library doesn't recognize it as a pdf file. The filesize is quite different as well, with the corrupt file being smaller.

Pol 2009-03-12 03:18:36

Answer 2

A:

Just in case these small changes make a difference, try this:

public static ByteBuffer getAsByteArray(URL url) throws IOException {
    URLConnection connection = url.openConnection();
    // Since you get a URLConnection, use it to get the InputStream
    InputStream in = connection.getInputStream();
    // Now that the InputStream is open, get the content length
    int contentLength = connection.getContentLength();

    // To avoid having to resize the array over and over and over as
    // bytes are written to the array, provide an accurate estimate of
    // the ultimate size of the byte array
    ByteArrayOutputStream tmpOut;
    if (contentLength != -1) {
        tmpOut = new ByteArrayOutputStream(contentLength);
    } else {
        tmpOut = new ByteArrayOutputStream(16384); // Pick some appropriate size
    }

    byte[] buf = new byte[512];
    while (true) {
        int len = in.read(buf);
        if (len == -1) {
            break;
        }
        tmpOut.write(buf, 0, len);
    }
    in.close();
    tmpOut.close(); // No effect, but good to do anyway to keep the metaphor alive

    byte[] array = tmpOut.toByteArray();

    //Lines below used to test if file is corrupt
    //FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
    //fos.write(array);
    //fos.close();

    return ByteBuffer.wrap(array);
}

You forgot to close fos which may result in that file being shorter if your application is still running or is abruptly terminated. Also, I added creating the ByteArrayOutputStream with the appropriate initial size. (Otherwise Java will have to repeatedly allocate a new array and copy, allocate a new array and copy, which is expensive.) Replace the value 16384 with a more appropriate value. 16k is probably small for a PDF, but I don't know how but the "average" size is that you expect to download.

Since you use toByteArray() twice (even though one is in diagnostic code), I assigned that to a variable. Finally, although it shouldn't make any difference, when you are wrapping the entire array in a ByteBuffer, you only need to supply the byte array itself. Supplying the offset 0 and the length is redundant.

Note that if you are downloading large PDF files this way, then ensure that your JVM is running with a large enough heap that you have enough room for several times the largest file size you expect to read. The method you're using keeps the whole file in memory, which is OK as long as you can afford that memory. :)

Eddie 2009-03-12 05:17:02

Answer 3

A:

Thanks Eddie. Unfortunately the changes didn't change the outcome, but I always welcome picking up better coding practices.

For some reason both IE and Mozilla Firefox consistently read just a portion of the file (18kb of a 59kb file according to a sum of the len variable), and then decides its the end of the file, while Safari reads all 59kb. It honestly doesn't make sense to me, but I haven't been working with Java that long.

Any ideas as to why different browsers would handle this bit of code differently? Any other options I can try? Different methods perhaps?

As a side note, I know this isn't an answer per se to my question, but I honestly don't know how to reply to you if I don't have 50 rep to comment.

Pol 2009-03-12 06:20:28

edit your question.

wds 2009-03-12 12:16:40

I don't "own" the question yet. Posted before I registered. Have mailed the team to associate it with my account, then will edit.

Pol 2009-03-13 01:05:53

Answer 4

A:

Have you tried a flush() before you close the tmpOut stream to ensure all bytes written out?

2009-03-12 07:47:13

close() does a flush()

jdigital 2009-03-13 04:17:42

Answer 5

A:

Are you absolutely positive this code is not throwing IOExceptions that you're not seeing because you ignore them from the caller of this method or some such? The code as is looks good to me.

wds 2009-03-12 12:27:24

Answer 6

A:

Try picking one of the answers from this thread

While the original question was to read an string from the contents of a file, the posted answers my be worked out to get binary contents.

Might help.

OscarRyz 2009-03-13 02:01:41

Answer 7

A:

Try running Fiddler (a free HTTP Debugging Proxy) and see if anything interesting shows up -- obviously you'll want to be sure that the server is sending the full stream, but you will also want to check content-length etc. You can use Fiddler with any browser but I'd use IE because the proxy will be automatically configured.

jdigital 2009-03-13 04:25:38

Answer 8

A:

I thought I had the same problem as you, but it turned out my problem was that I assumed you always get the full buffer until you get nothing. But you do not assume that. The examples on the net (e.g. java2s/tutorial) use a BufferedInputStream. But that does not make any difference for me.

You could check whether you actually get the full file in your loop. Than the problem would be in the ByteArrayOutputStream.

openCage 2010-05-03 16:04:29

ansaurus

tags:

views:

answers:

Java: Reading a pdf file from URL into Byte array/ByteBuffer in an applet.

related questions