views:

249

answers:

4

Is it possible to loop through a query so that if (for example) 500,000 rows are found, it'll return results for the first 10,000 and then rerun the query again?

So, what I want to do is run a query and build an array, like this:

$result = pg_query("SELECT * FROM myTable");

$i = 0;
while($row = pg_fetch_array($result) ) {
  $myArray[$i]['id'] = $row['id'];
  $myArray[$i]['name'] = $row['name'];
  $i++;
}

But, I know that there will be several hundred thousand rows, so I wanted to do it in batches of like 10,000... 1- 9,999 and then 10,000 - 10,999 etc... The reason why is because I keep getting this error:

Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 3 bytes)

Which, incidentally, I don't understand how 3 bytes could exhaust 512M... So, if that's something that I can just change, that'd be great, although, still might be better to do this in batches?

A: 

You can use LIMIT (x) and OFFSET (y)

sriehl
I looked into this and thought I could get a count of the total and then make a for loop, but so far, it doesn't seem to be working...
CaffeineIV
@Caffeine, you totally do *not* need to know the total number of rows, nor do you need a `for` loop. Execute the statement inside a `while` loop, and break the loop as soon as the query returns less than the `x` rows you requested, meaning you have reached the end.
vladr
@Caffeine, did you understand the negative implications of using `LIMIT`/`OFFSET` as opposed to cursors?
vladr
A: 

The PostgreSQL server caches query results until you actually retrieve them, so adding them to the array in a loop like that will cause an exhaustion of memory no matter what. Either process the results one row at a time, or check the length of the array, process the results pulled so far, and then purge the array.

Ignacio Vazquez-Abrams
I'm a little confused by this... Isn't that what I'm doing? Running the query and then processing the results? Sorry... I'm not following...
CaffeineIV
It's a bit difficult to understand how adding the rows to an array can be seen as processing them.
Ignacio Vazquez-Abrams
A: 

What the error means is that PHP is trying to allocate 3 bytes, but all the available portion of that 512MB is less than 3 bytes.

Even if you do it in batches, depending on the size of the resulting array you could still exhaust the available memory.

Perhaps you don't really need to get all the records?

Alex - Aotea Studios
No, I really do...
CaffeineIV
+1  A: 

Those last 3 bytes were the straw that broke the camel's back. Probably an allocation attempt in a long string of allocations leading to the failure.

Unfortunately libpq will try to fully cache result sets in memory before relinquishing control to the application. This is in addition to whatever memory you are using up in $myArray.

It has been suggested to use LIMIT ... OFFSET ... to reduce the memory envelope; this will work, but is inefficient as it could needlessly duplicate server-side sorting effort every time the query is reissued with a different offset (e.g. in order to answer LIMIT 10 OFFSET 10000, Postgres will still have to sort the entire result set, only to return rows 10000..10010.)

Instead, use DECLARE ... CURSOR to create a server-side cursor, followed by FETCH FORWARD x to fetch the next x rows. Repeat as many times as needed or until less-than-x rows are returned. Do not forget to CLOSE the cursor when you are done, even when/if an exception is risen.

Also, do not SELECT *; if you only need id and name, create your cursor FOR SELECT id, name (otherwise libpq will needlessly retrieve and cache columns you never use, increasing memory footprint and overall query time.)

Using cursors as illustrated above, libpq will hold at most x rows in memory at any one time. However, make sure you also clean up your $myArray in between FETCHes if possible or else you could still run out of memory on account of $myArray.

Cheers, V.

vladr
Does LIMIT...OFFSET... really involve sorting?
armandino
If you want `LIMIT ... OFFSET ...` to be deterministic then they will be accompanied by an `ORDER BY` which, unless you are fortunate enough to need a simple index scan, *will* involve sorting. And even if you are lucky and you get away with index scans, they are repeated scans you can do without by using cursors.
vladr