views:

1426

answers:

6

Hi there
I'm using the PHPExcel lib which it seems to do great jobs, but in my approach it doesn't seem to be so fast.

I need to generate a excel file with allot of sheets, data, formulas and styles(bold, color, border) and it takes me allot of computing resources and time.
I think that my approach is not so good. Which has to be the right order when is about to generate this kind of file(excel) using PHP code and datas from a mysql DB.

1) generate all the sheets with no format/style, no formula, adding datas and the formulas and style
2) generate sheet by sheet with datas, adding formulas, adding style
3)....

Is better to have all the need data from the DB in a multidimensional array and not to query the db each time I wish to write to a new sheet?

I'm using a dual core processor machine and 4GB or RAM and takes me about 1minute if the Apache doesn't crush.

+1  A: 

Database queries, even ones that return like 30,000 rows, generally only take a fraction of a second. With a proc and RAM like that, it's probably slowing down at IO. I'd be willing to bet that most of the time is being spent on writing the file(s).

That being said, I would suggest having all the data up front. At least then, you're reading from memory. I'm not sure if MySQL loads everything into memory with a pointer to the current row, or if it only loads the current row.

Collin Klopfenstein
A normal, buffered query will load the whole resultset into memory. This is because cursoring through the data is done by the C client library, which is what the PHP library calls.
staticsan
+1  A: 

With a minute processing time your dataset doesn't sound large enough to require partitioning of the result set.

So with that assumed, loading it all into memory would be fine.

However, mysql caches the query anyway, so successive duplicate queries shouldn't be a large issue either.

It does sound like IO write to file on disk is your slow point. I don't know much about how styles and formating works for xls files but I'm pretty sure you could reduce a lot of pointless writes by reducing formatting or organizing it so that you are not doing any in-line formating, if thats even possible.

The best thing I would advise is optimizing your query. That is usually the first optimizing place I look with this type of issue. If you could edit your post and include the sql I could have a look. (feel free to snip&strip sensitive information)

jim
+2  A: 

I would suggest profiling your PHP code to find out where the most time is spent. You can find a step-by-step tutorial about using XDebug to profile PHP conde on my blog. Then try to optimize the parts which consume the most time / are called the most frequently. One of the best optimizations you can do is to replace PHP methods by built-in functions if possible, because they are orders of magnitude faster.

Cd-MaN
A: 

MySQL is going to be able to perform any calculations/transformations to the data than your PHP tier, so I would write one query that does all of the work to get the data for one spreadsheet.

Than do this once for each spreadsheet.

Justin
+2  A: 
  1. have mysql do the calculations: I once optimized a report where the original developer used loops to calculate sums rather that sql sum function, it was so badly written it exceeded time limit for php
  2. make sure your calculation SQL are optimized enough
  3. calculate then generate content
Moutaz
+1  A: 

It can be faster to do more, simpler queries than to do less, more complex queries. But you need to benchmark them because it depends a lot on the data you're querying.

That said, you may be running into memory issues in PHP. What I would recommend is to see if you can re-arrange the data processing algorithm. What I would be working toward is to get only one large dataset at a time, but already sorted by MySQL in the order you need to process it. Then you can use an unbuffered query.

An unbuffered query means the client library won't store the whole resultset before handing it to your code. That means it can be larger than the library would normally handle, but for that to work, your processing loop must fetch a row from the API on each cycle -- i.e. this means a while() loop calling mysql_fetch_assoc() each time.

The other advantage of working through the resultset that is already in the order you need it is that you can ask for more than what you need and filter it out as you go. This works when asking for more because it is less filtering by the database.

staticsan