I am currently working on a project that extensively relies on the EAV model. Both entities as their attributes are individually represented by a model, sometimes extending other models (or at least, base models).
This has worked quite well so far since most areas of the application only rely on filtered sets of entities, and not the entire dataset.
Now, however, I need to parse the entire dataset (IE: all entities and all their attributes) in order to provide a sorting/filtering algorithm based on the attributes.
The application currently consists of aproximately 2200 entities, each with aproximately 100 attributes. Every entity is represented by a single model (for example Client_Model_Entity
) and has a protected property called $_attributes
, which is an array of Attribute
objects.
Each entity object is about 500KB, which results in an incredible load on the server. With 2000 entities, this means a single task would take 1GB of RAM (and a lot of CPU time) in order to work, which is unacceptable.
Are there any patterns or common approaches to iterating over such large datasets? Paging is not really an option, since everything has to be taken into account in order to provide the sorting algorithm.
EDIT: a code example to hopefully make things clearer:
// code from the resource model
for ($i=0,$n=count($rowset);$i<$n;++$i)
{
$clientEntity = new Client_Model_Entity($rowset[$i]);
// getattributes gets all possible attributes from the db and creates models for them
// this is actually the big resource hog, as one client can have 100 attributes
$clientEntity->getAttributes();
$this->_rows[$i] = $clientEntity;
// memory usage has now increased by 500KB
echo $i . ' : ' . memory_get_usage() . '<br />';
}