views:

77

answers:

2

I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.

My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.

Through tests, I've discovered that the problem lies in the size of the $html variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.

Helpful Info

  • PHP 5.3
  • TCPDF for PDF generation (also tried PS2PDF)
  • Script Memory Limit: 500 MB

How would you guys handle this scale of data when generating a PDF of this size?

A: 

I would break the PDF into parts, just like pagination.

1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.

2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.

shamittomar
A: 

TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.

Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.

Lèse majesté
I'm passing in one giant HTML string that TCPDF then splits into a multi-page PDF. I think the problem lies not with the PDF generator, but with the large size of the `$html` variable.
Levi Hackwith
@Levi Hackwith: That doesn't make any sense. PDFs can be as large as you want. If a PDF generator fails when its input HTML is too large, then that is a problem with the generator. Otherwise, just reduce the size of your HTML file and create many small PDFs like shamittomar suggests. Though most desktop programs have no problem creating PDFs with 100,000+ pages.
Lèse majesté
So is it general consensus that the problems lies within TCPDF, not PHP itself?
Levi Hackwith
@Levi Hackwith: Unless the script is exceeding the maximum script execution time or script memory limit, it's unlikely to be a problem with PHP. I also highly doubt you will need more than 500MB to generate a PDF unless each record itself is huge. But you can always check using `memory_get_peak_usage()`.
Lèse majesté
@Lese gets the answer since he suggested the whole "command line" approach which is going to be the best way to handle this
Levi Hackwith