views:

172

answers:

4

the below code merges the pdf files and returns the combined pdf data. while this code runs, i try to combine the 100 files with each file approximately around 500kb, i get outofmemory error in the line document.close();. this code runs in the web environment, is the memory available to webspehere server is the problem? i read in an article to use freeReader method, but i cannot get how to use it my scenario.

protected ByteArrayOutputStream joinPDFs(List<InputStream> pdfStreams,
        boolean paginate) {

    Document document = new Document();

    ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream();

    try {
        //List<InputStream> pdfs = pdfStreams;
        List<PdfReader> readers = new ArrayList<PdfReader>();
        int totalPages = 0;
        //Iterator<InputStream> iteratorPDFs = pdfs.iterator();
        Iterator<InputStream> iteratorPDFs = pdfStreams.iterator();

        // Create Readers for the pdfs.
        while (iteratorPDFs.hasNext()) {
            InputStream pdf = iteratorPDFs.next();
            if (pdf == null)
                continue;
            PdfReader pdfReader = new PdfReader(pdf);
            readers.add(pdfReader);
            totalPages += pdfReader.getNumberOfPages();
        }

        //clear this
        pdfStreams = null;

        //WeakReference ref = new WeakReference(pdfs);
        //ref.clear();

        // Create a writer for the outputstream
        PdfWriter writer = PdfWriter.getInstance(document, mergedPdfStream);
        writer.setFullCompression();

        document.open();
        BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
                BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
        PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
        // data

        PdfImportedPage page;
        int currentPageNumber = 0;
        int pageOfCurrentReaderPDF = 0;
        Iterator<PdfReader> iteratorPDFReader = readers.iterator();

        // Loop through the PDF files and add to the output.
        while (iteratorPDFReader.hasNext()) {
            PdfReader pdfReader = iteratorPDFReader.next();

            // Create a new page in the target for each source page.
            while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
                pageOfCurrentReaderPDF++;
                document.setPageSize(pdfReader
                        .getPageSizeWithRotation(pageOfCurrentReaderPDF));
                document.newPage();
                // pageOfCurrentReaderPDF++;
                currentPageNumber++;
                page = writer.getImportedPage(pdfReader,
                        pageOfCurrentReaderPDF);
                cb.addTemplate(page, 0, 0);

                // Code for pagination.
                if (paginate) {
                    cb.beginText();
                    cb.setFontAndSize(bf, 9);
                    cb.showTextAligned(PdfContentByte.ALIGN_CENTER, ""
                            + currentPageNumber + " of " + totalPages, 520,
                            5, 0);
                    cb.endText();
                }
            }
            pageOfCurrentReaderPDF = 0;
            System.out.println("now the size is: "+pdfReader.getFileLength());
        }
        mergedPdfStream.flush();
        document.close();
        mergedPdfStream.close();
        return mergedPdfStream;
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (document.isOpen())
            document.close();
        try {
            if (mergedPdfStream != null)
                mergedPdfStream.close();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }
    return mergedPdfStream;
}

Thanks V

A: 

100 files * 500 kB is something around 50 MB. If maximum heap size is 64 MB I'm pretty sure this code won't work in such conditions.

hudolejev
+3  A: 

This code merges all the PDF's in an array in the memory (the heap) so yes, memory usage will grow linearly with the number of files merged.

I don't know about the freeReader method, but maybe you could try to write the merged PDF into a temporary file instead of a byte array ? mergedPdfStream would be a FileOutputStream instead of a ByteArrayOutputStream. Then you return e.g. a File reference to the client code.

Or you could increase the quantity of memory Java can use (-Xmx JVM parameter), but if the number of files to merge eventually increases, you will find yourself with the same problem.

Pierre Henry
thanks Pierre Henry.If i write to the FileOutputStream and return the file to the client, do not i risk myself revealing the file directly to the client? how do i get around this? do i again read the file and write to the stream? any other suggestions?
Vijay
@Vijay - but then, why not write to the client rigth away? Only this way will you be able to serve PDF files that are larger than the memory allocated to your JVM.
Ingo
@Ingo : if the file is destined to be read by the client only yes, but maybe it also needs to be stored.@Vijay : in my comment I meant "client code" : the code that needs that is calling the merge, not the actual client of the application. So my idea was : do the merge, save it to file. Then you have the file and can do whatever is needed with it : leave it on the server to be stored, or read it with a new Input stream and write to the client's request output stream to send him the file...
Pierre Henry
A: 

First, why do you clutter your code with all those Iterator<> boilerplate code? Do you ever heard of the for statement? i.e

for (PDfReader pdfReader: readers) { 
      // code for each single PDF reader in readers
}

Second: consider to close the pdfReader as soon as it is done. This will hopefully flush some buffers and free the memory occupied by the original PDF.

Ingo
+1  A: 

This is not proper way of doing file operation. You are doing merging of files using ArrayList and Array in memory. You should rather use File IO with buffering techniques.

Do you wish to show the final merged file at last? Then you can open the file after all your merging is done.

  • Do not use only in-memory buffering as you have shown. Use File Io with buffering (byte[] i mean)
  • Close each file after you read it and append it.

Java has limited memory you allocated at startup time, so merging some big number of file at once like this will lead to crashing of application. You should try this merging operation in separate thread using ThreadPool, so that your application will not get stucked for this.

thanks.

Paarth
dear Paarth,yes you are correct. I would like to write the final merged file to the client. you want me to write to the file with buffer, than in-memory? if i open the file finally, after merging, to write it to the stream, can u please explain little more? as this module runs in the web environment how do i implement this in thread?
Vijay
hey,I meant, make anew file, append your all files to it one by one. You can not do it concurrently (Means 1 thread for each PDF file you want to merge) because this is sequential operation.What i meant is make a new thread for merging functionality. The whole thing in a new thread.Please check the following link which shows merging using iText.http://sanjaal.com/java/2010/02/04/merging-two-or-more-pdfs-using-lowagie-itext-api/
Paarth
FYI:the "List<InputStream> pdfStreams" contains the FileInputStream objects, each element containing the file to be merged into.
Vijay
ok paarth, il rewrite this to perform merging in a disk file with buffer than in-memory. but can u plz explain me how do i open the merged file, read it and write to the servlet stream? buffering can be useful, right?
Vijay
i have one more doubt, the idea is to create a file each time and delete it once written to the client, what about multiple requests? clashes to the file between different requests?
Vijay
hey,I hope the List you maintaining for streams is not living for more time. Otherwise it can degrade performance right?Ok, do you want to send file to browser using servlet outputStream? You can do that using writing to stream directly (Try and hope it should work for big files. Set proper content type)Or you can make an applet for file downloading. means applet will load on client side, download files on client side and then merge them :). That applet should make final file at place where client wants.got my point?thanks.
Paarth
wow, that s a different thought! il try with the 2nd option u said, download an applet merge in the client side. thanks a lot for supporting me.
Vijay
thanks mate, its for what we all are here...:) to help each other and learn new thing.thanks.
Paarth
paarth, i tried the site you referred for pdf merge, I got an idea, while the pdfs are merged, i write it immediately to the servletoutputstream. that is writer = new PdfCopy(document, outputStream);to the outputstream object i pass the servletoutputstream instance. one the merge was done il flush the stream and close it. each time the pdf merge it writes directly to the stream, not store. how about this?
Vijay