tags:

views:

71

answers:

4

Hi All

I would like to ask your opinion to my case. We have big table. And monthly we do reporting on that table. That is we need to download up to 20000 records as PDF or Excel file and print it out. I am planning to generate reports in real time. No in advance generation. Is it a good way to solve my problem ? or if u have better idea i would like to hear it.

Thanks

+1  A: 

It depends if you are going to be generating this PDF a lot. If you are generating this frequently, it is probably a better idea to cache the last generated PDF for 15 to 30 mins to avoid constantly processing this "big table".

It will take a little while to get all that data but if you don't mind the delay it can be a good solution to not generate it in advance.

If you have a lot of people accessing the PDF, and don't want a delay, and the data doesn't change very rapidly you should probably generate it in advance. Your generation interval should be consistent with how quickly your data goes stale. If data changes maybe once per day, a daily update is normally sufficient. If the data changes a lot you may be generating every 30 mins or so.

So it depends on who is going to be accessing the PDF and how often.

Kekoa
+1  A: 

I'm not totally getting your question; but when I need to do real time reporting on really big tables or multiple really big tables, what I do is pre-calculate any totals I want.

So instead of queries like:

select count(*), sum(items) * price, datefield, type from bigtable bt join reallybigtable rbt on bt.id=rbt.rbtid where datefield between 'january 1, 2009' and 'january 31, 2009' group by type, datefield

We will have a stored procedure calculate the daily totals every night to a second table - then its just a simple matter of adding up 30 days worth of pre-calculated totals, not a matter of counting joining and summing a ba-zillion records.

Kyle Hodgson
+1  A: 

If you're generating that big a report, it's hard to imagine that users would expect frequent updates. In general, I figure that people place a pretty high value on being able to download consistent repeatable views on at best a daily basis. In fact, that consistency/repeatability is usually a good reason to refresh the reports no more often than daily, and then store the results.

If a given report is unlikely to be needed daily or more often, you could do a lazy report generation, storing a copy from the first time it's created for a given day, and maybe use a file naming scheme something like "RepABC_05032009.xls" as a marker for a given day's report.

le dorfier
A: 

20000 records is really not that big, so generating "on the fly" will surely work fine (unless the query to retrieve those records is complex/slow).

I recommend using Excel because it is much easier to implement. Just output csv data (PHP has ready-made functions for this), and send appropriate content header in response.

Another reason for Excel instead of PDF is that users can do some minor tweaks and modifications before printing (changing landscape/portrait layout, line numbers, adding a custom memo, etc.).

Milan Babuškov