I'm trying to create an XML sitemap using CakePHP, from a table which has more than 50,000 records at the moment, each record equivalent to a URI in the sitemap. Now the problem I'm facing is CakePHP is running me out of memory while generating it, for two reasons:

  1. A find('all') is building a huge associative array of the entire set of 50,000 URIs.
  2. Since I don't want to output HTML from the controller itself, I'm transferring the associative array containing URI, priority, change frequency etc., to the view with a $this->set() call -- which again is huge, containing 50,000 indices.

Is it possible at all, to do this while following MVC and CakePHP guidelines?


Have you tried unBindModel (if you have relations)...

Whenever I have to do huge queries in cakephp i just use the "regular" mysql-funtions like mysql_query, mysql_fetch_assoc etc. Much faster, and no lack of memory...

This is a single table SELECT. Looks like old-school MySQL is the efficient and _only_ way to go, but I'm way out of CakePHP by then. And I still have the problem of how to render the calculated associated array.
Alex J
+3  A: 

I had a similar problem this week, and stumbled across the Containable Behavior. This allows you to cut down any relationship related queries (if you have any).

The best solution would be to programmatically use LIMIT and OFFSET, and loop through the recordset small chunks at a time. This saves you from stuffing 50K records into memory at once.

Thanks for the reply. I don't have any related table and the query is more or less a simple `SELECT` operation. Firing multiple queries is something I wanted to avoid. Plus, even if I do that, how do I still transfer the data to the view?
Alex J
Would you be displaying all 50k records in the same view? If not, most pagination patterns work well with the limit/offset query.If you do have to display them all at once, maybe dig into your php.ini (if you have admin access to your server) and change memory_limit to a higher value. That might fix your memory problems with a find('all).If your table has a lot of fields, use the 'fields' value to narrow down what is necessary (as mentioned in another comment).
+1  A: 

find('all') is way too greedy, you'll have to be more specific if you don't want to run out of memory.

As stated above, use the Containable behavior. If you only need results from your table, (without associated tables) and for only a couple of fields, a more explicit query like this should be better :

$results = $this->YourModel->find('all', array(
    'contain' => false,
    'fields' => array('', 'YourModel.url')

You should also consider adding an html cache mechanism (cakePHP has a builtin or using the one suggested by Matt Curry).

Of course it will be a cached version, and won't be perfectly up to date to your list. If you want more control you can always save the result in cake cache (using Cache::write), using the afterSave/afterDelete callbacks of your model to update the cached value and recreate the cached xml file from here.

Specifically, look at the "fields" argument to find().
Travis Leleu
+2  A: 

Are you sure you have to run out of memory on 50.000 records? Even if a row is 1K in size (pretty huge), you would have to deal with ~ 50 MB of data? My P1 had enough RAM to handle that. Set memory_limit in php.ini higher than the default. (Consider also tweaking max_execution_time.)

On the other hand, if you consider the data set as too huge and processing it as too resource intensive, you should not serve that page dynamically, it is the perfect DDoS bait. (At least I would cache it heavily.) You could schedule a cron job to re-generate the page every X hours by a server side script free from the MVC penalty of serving all data at once to the view, it could work on the rows sequentially.

The sitemap works absolutely fine on my local dev box. Once I deploy to my shared hosting where I have very limited memory, it explodes. This is exactly what I was thinking of, just wanted to double check and make sure I'm going that way because there are no CakePHP/MVC options left. Thanks!
Alex J

I suggest you have a look of "TreeBehavior"

Henry Jin