views:

2042

answers:

4

In a system I am currently working on, there is one process that loads large amount of data into an array for sorting/aggregating/whatever. I know this process needs optimising for memory usage, but in the short term it just needs to work.

Given the amount of data loaded into the array, we keep hitting the memory limit. It has been increased several times, and I am wondering is there a point where increasing it becomes generally a bad idea? or is it only a matter of how much RAM the machine has?

The machine has 2GB of RAM and the memory_limit is currently set at 1.5GB. We can easily add more RAM to the machine (and will anyway).

Have others encountered this kind of issue? and what were the solutions?

+3  A: 

The configuration for the memory_limit of PHP running as an Apache module to server webpages has to take into consideration how many Apache process you can have at the same time on the machine -- see the MaxClients configuration option for Apache.

If MaxClients is 100 and you have 2,000 MB or RAM, a very quick calculation will show that you should not use more than 20 MB *(because 20 MB * 100 clients = 2 GB or RAM, ie the total amount of memory your server has)* for the memory_limit value.

And this is without considering that there are probably other things running on the same server, like MySQL, the system itself, ... And that Apache is probably already using some memory for itself.

Or course, this is also a "worst case scenario", that considers that each PHP page is using the maximum amount of memory it can.


In your case, if you need such a big amount of memory for only one job, I would not increase the memory_limit for PḦP running as an Apache module.

Instead, I would launch that job from command-line (or via a cron job), and specify a higher memory_limit specificaly in this one and only case.

This can be done with the -d option of php, like :

$ php -d memory_limit=1GB temp.php
string(3) "1GB"

Considering, in this case, that temp.php only contains :

var_dump(ini_get('memory_limit'));

In my opinion, this is way safer than increasing the memory_limit for the PHP module for Apache -- and it's what I usually do when I have a large dataset, or some really heavy stuff I cannot optimize or paginate.


If you need to define several values for the PHP CLI execution, you can also tell it to use another configuration file, instead of the default php.ini, with the -c option :

php -c /etc/phpcli.ini temp.php

That way, you have :

  • /etc/php.ini for Apache, with low memory_limit, low max_execution_time, ...
  • and /etc/phpcli.ini for batches run from command-line, with virtually no limit

This ensures your batches will be able to run -- and you'll still have security for your website (memory_limit and max_execution_time being security measures)


Still, if you have the time to optimize your script, you should ; for instance, in that kind of situation where you have to deal with lots of data, pagination is a must-have ;-)

Pascal MARTIN
Yes, the limit is only being upped for the process that requires it. Unfortunately though, pagination won't help in this case; the end result is only a handful (20 or so) statistical numbers, it's just the processing that requires the space.
Brenton Alker
Oh, OK... Then, I suppose it kind of depends on how much "free" memory you have when launching that process : if you launch it in the middle of the night, when there's almost no-one using your server, maybe 1.5 GB might be OK (I would use more, to let some memory to the rest of the system) -- but only you can say how much your server is loaded at that time.
Pascal MARTIN
A: 

Have you tried splitting the dataset into smaller parts and process only one part at the time?

If you fetch the data from a disk file, you can use the fread() function to load smaller chunks, or some sort of unbuffered db query in case of database.

I haven't checked up PHP since v3.something, but you also could use a form of cloud computing. 1GB dataset seems to be big enough to be processed on multiple machines.

arul
The plan is to only read the data as required (as you're suggesting with fread()), but that's the refactoring we haven't got to yet.
Brenton Alker
+1  A: 

Given that you know that there are memory issues with your script that need fixing and you are only looking for short-term solutions, then I won't address the ways to go about profiling and solving your memory issues. It sounds like you're going to get to that.

So, I would say the main things you have to keep in mind are:

  • Total memory load on the system
  • OS capabilities

PHP is only one small component of the system. If you allow it to eat up a vast quantity of your RAM, then the other processes will suffer, which could in turn affect the script itself. Notably, if you are pulling a lot of data out of a database, then your DBMS might be require a lot of memory in order to create result sets for your queries. As a quick fix, you might want to identify any queries you are running and free the results as soon as possible to give yourself more memory for a long job run.

In terms of OS capabilities, you should keep in mind that 32-bit systems, which you are likely running on, can only address up to 4GB of RAM without special handling. Often the limit can be much less depending on how it's used. Some Windows chipsets and configurations can actually have less than 3GB available to the system, even with 4GB or more physically installed. You should check to see how much your system can address.

You say that you've increased the memory limit several times, so obviously this job is growing larger and larger in scope. If you're up to 1.5Gb, then even installing 2Gb more RAM sounds like it will just be a short reprieve.

Have others encountered this kind of issue? and what were the solutions?

I think you probably already know that the only real solution is to break down and spend the time to optimize the script soon, or you'll end up with a job that will be too big to run.

zombat
The machine is pretty much dedicated to PHP, database servers are separate. The previous limit increases weren't because the job is increasing (though it will, but not that quickly), they were insufficient to "fix" the issue (so we tried a larger limit). We now have a limit that lets the process run, but it is definitely only a short term solution. Good point about the system limitations, it is a 64bit machine and we will be increasing the RAM anyway. Anyway, I think you've allayed my fears that throwing that much memory at a single process was absurd. RDBMS's do it, why can't I :)
Brenton Alker
A: 

Can you give us information about the task that script executes? May be we will help you optimizing the algo to eliminate excessive memory consumption.

FractalizeR