tags:

views:

73

answers:

7

Hello,

this question may seem too basic to some, but please bear with be, it's been a while since I dealt with decent database programming.

I have an algorithm that I need to program in PHP/MySQL to work on a website. It performs some computations iteratively on an array of objects (it ranks the objects based on their properties). In each iteration the algorithm runs through all collection a couple of times, accessing various data from different places of the whole collection. The algorithm needs several hundred iterations to complete. The array comes from a database.

The straightforward solution that I see is to take the results of a database query and create an object for each row of the query, put the objects to an array and pass the array to my algorithm.

However, I'm concerned with efficacy of such solution when I have to work with an array of several thousand of items because what I do is essentially mirror the results of a query to memory.

On the other hand, making database query a couple of times on each iteration of the algorithm also seems wrong.

So, my question is - what is the correct architectural solution for a problem like this? Is it OK to mirror the query results to memory? If not, which is the best way to work with query results in such an algorithm?

Thanks!

UPDATE: The closest problem that I can think of is ranking of search results by a search engine - I need to do something similar to that. Each result is represented as a row of a database and all results of the set are regarded when the rank is computed.

+1  A: 

It really depends on the situation at hand. It's probably rarely required to do such a thing, but it's very difficult to tell based off of the information you've given.

Try to isolate the data as much as possible. For instance, if you need to perform some independent action on the data that doesn't have data dependencies amongst iterations of the loop, you can write a query to update the affected rows rather than loading them all into memory, only to write them back.

In short, it is probably avoidable but it's hard to tell until you give us more information :)

San Jacinto
Sorry, it's hard for me to give more information without writing a lengthy description which no one will feel like reading :)The closest problem that I can think of is ranking of search results by a search engine - I need to do something similar to that. Each result is represented as a row of a database and all results of the set are regarded when the rank is computed.
Corvin
Even for the example you gave, you wouldn't be loading all of the results into memory at once. You should be operating on smaller sets of the data and performing the process iteratively if you're worried about overrunning the system.
San Jacinto
A: 

Is the computation of one object dependent on another, or are they all independent? If they are independent, you could load just a small number of rows from the database, converting them to objects as you describe. Then run your hundreds of iterations on these, and then output the result for that block. You then proceed to the next block of items.

This keeps memory usage down, since you are only dealing with a small number of items rather than the whole data set, and avoids running multiple queries on the database.

The SQL keywords LIMIT and OFFSET can help you step through the data block by block.

mdma
They are all dependent, so I can't see a way to work with a small subset
Corvin
+1  A: 

Memory seems like the best way to go - iff you can scale up to meet it. Otherwise you'll have to revise your algorithm to maybe use a divide and conquer type of approach - do something like a merge sort.

Jakub Hampl
+1  A: 

If you are doing a query to the database, when the results come back, they are already "mirrored to memory". When you get your results using mysql_fetch_assoc (or equiv) you have your copy. Just use that as the cache.

Zak
Is it really so? Do you want to say that "SELECT * FROM xxx" on a table with 10000 rows with blobs of ~50K in each record will chomp 500Mb of memory for a single query? I thought that PHP is better than that...
Corvin
+2  A: 

Don't forget, premature optimization is the root of all evil. Give it a shot copying everything to memory. If that uses too much mem, then optimize for memory.

Zak
I think that quote is overused. I think it's plenty fair to plan for expansion by making sound design decisions that aren't so easy to change once they're implemented. Worrying about using a 4-byte number vs. a mechanism of bit-shifting and addition to accomplish the same data representation with a 1-byte number is more appropriate for the quote (please note, this comment did not have an accompanying downvote).
San Jacinto
Very well, then he should be an engineer and do a cost estimate of the number of rows of expected data, times the per row data size, and determine the approximate memory requirement for the profile of the procedure. If that number is greater than the available memory footprint minus some reasonable overhead, then he should worry about designing for a memory/query access tradeoff. I've found that 99% of the time, it's faster to just do what I said in the first place, and let the out of memory error tell you you need to optimize for memory.
Zak
Yeah, fair enough. I don't think a few thousand records is too much to sort through if done properly... just that I'm sick of the cliche :)
San Jacinto
A: 

Writing ranking queries with MySQL is possible as well, you just need to play with user-defined variables a bit. If you will provide some input data and the result you are going to achieve, the replies will be more detailed

newtover
That's an option too, thanks. I wonder if stored procedures would do the trick.
Corvin
A: 

can you use a cron job to do your ranking, say once per day, hour, or whatever you need, and then save the items ranking to a field in its row?

that way when you call your rows up you could just order them by the ranking field.

David Morrow
Thanks, but that's not an option - too many combinations are possible. It's much like search engine - you wouldn't be able to cache all possible search queries
Corvin