views:

348

answers:

6

I am writing an algorithm to generate combinations of items from a database. They need to be unique permutations (i.e. 145, 156 == 156, 145). The problem I am running into is how to keep track of previous combinations so that i do not end up with 145, 156 and 156, 145.

Currently I am adding them to an array with index of id1_id2... (sorted so id's are always be lowest to highest) and setting the value equal to 1 when a combo is generated so that i can check if $combos[$index] exists or not. If it does not exist, create it. (there are other criteria to weed out EVERY permutation, but they are irrelevant) Once these combinations are generated, they are being stored in a table in MySQL.

The problem I am running into is that with the test items i'm using (about 85) I cannot generate a combinations with more than 3 items (id1_id2_id3) without running out of memory as the number of combinations is MASSIVE and the $combos array takes up more than the 64M i am allotted in PHP memory.

Is there a way that I can do this a) without keeping track of previous combos or b) skipping the $combos array route and only adding a unique row to mysql and let mysql handle the duplicate checking.

Here is some pseudo code for reference:

$items = array(/*85 items*/);
foreach ($items as $item1){
    generate(array($item1));
        foreach($items as $item2){
            generate(array($item1, $item2));
        }
    }
}

function generate($items_arary){
    $temp_array = array();
    foreach ($items_array as $item){
        $temp_array[] = $item['id'];
    }

    sort($temp_array);
    $index = implode("_", $temp_array);

    if (!$combos[$index]){
        $combos[$index] = 1;
        /* some code to generate query to store to db */
    }
}

the query ends up looking like this: (the database is truncated at beginning of script)

INSERT INTO `combos` (combo_id, more_info) VALUES ('id1_id2', 'Item Name');

In the process of writing this question, I thought of a possible solution: Making sure id3 > id2 > id1. Would this be a viable solution to remove the need for $combos?

A: 

to increase memory change

memory_limit = 512M in your php.ini
or
ini_set('memory_limit', '512M') in your php script
or
php_value memory_limit 512M in your .htaccess

Alex L
I know I can increase memory size. It will be running on a hosted server that i'm fairly positive i cannot change this value. Unless 64M is ridiculously small.
contagious
64 M is definitly not small ; 32 M is already considered at "quite big" by some hosting companies -- and I've never, in 3 years, worked on a "professionnal" application that required more than 32 M (except for some batches which run for hours during the night, typically)
Pascal MARTIN
A: 

Yes. You can store and use the lexicographical index of the combination to reconstruct/iterate them, or Grey Codes if you need to iterate all of them.

Take a look at: "Algorithm 515: Generation of a Vector from the Lexicographical Index"; Buckles, B. P., and Lybanon, M. ACM Transactions on Mathematical Software, Vol. 3, No. 2, June 1977.

I've translated into C here, and describe more here.

nlucaroni
A: 

If your host supports this, you could use another kind of data-store than "only-memory".

Storing this in MySQL would be OK, but, if you don't want to use your main Database for that, you could think about using SQLite, or SQLite3 -- the second being a more recent version.

The main advantage of SQLite is that it stores its whole DB in only one file ; for your needs, which are quite simple, it should be pretty much like MySQL :-)

One advantage of using a file-based DB, instead of RAM memory, is that even if your script is terminated (think about max_execution_time, for instance), data already calculated won't be lost, as they are stored in a file.
That mean you will be able to resume your calculations later, with another execution of a PHP script -- provided your script is able to detect where the previous execution did stop.

Pascal MARTIN
+2  A: 

The reason I asked about the before data structure is because you could do something like this:

$sql = "SELECT id FROM test_a";
$result = mysql_query($sql);
while ($row = mysql_fetch_array($result)) {
  $item1 = $row['id'];

  $sql2 = "SELECT id FROM test_a";
  $result2 = mysql_query($sql2);
  while ($row2 = mysql_fetch_array($result2)) {
    $item2 = $row2['id'];

    $combo1 = $item1 . "_" . $item2;
    $combo2 = $item2 . "_" . $item1;

    $sql3 = "SELECT * FROM combos WHERE combo_id = '$combo1' OR combo_id = '$combo2'";
    $result3 = mysql_query($sql3);
    if (mysql_num_rows($result3) == 0) {
      $sql4 = "INSERT INTO combos (combo_id, more_info) VALUES ('$combo1','Item Name')";
      $result4 = mysql_query($sql4);
    }
  }
}

When table test_a has the values 1,2,3, and 4 this script inserts: 1_1 1_2 1_3 1_4 2_2 2_3 2_4 3_3 3_4 4_4

This shouldn't have any memory problems. Although if you have a huge database you may run into a issue with php's time limit

Justin Giboney
Yes, that is a viable option, and exactly what i was thinking about with a mysql solution. I AM running into a time limit option now.
contagious
by option you mean it is having a problem with the time limit?
Justin Giboney
As below - you can avoid the need for duplicate checks if you add a where clause to your second select.
Nick Johnson
+1  A: 

Here is the same concept as my other answer but in an all SQL format.

INSERT INTO combos (combo_id, more_info) 
  SELECT CONCAT_WS("_",t1.id,t2.id), "item_name" 
  FROM test_a t1, test_a t2 
  WHERE NOT EXISTS (SELECT * FROM combos WHERE combo_id = CONCAT_WS("_",t1.id,t2.id))
    AND NOT EXISTS (SELECT * FROM combos WHERE combo_id = CONCAT_WS("_",t2.id,t1.id))

Assuming you can get item_name from the db somewhere, this will probably be your fastest and least memory intensive solution. I am running a test on around 1000 ids at the moment. I'll update this when it finishes.

Justin Giboney
You can eliminate the sub-selects by imposing the clause "WHERE t2.id >= t1.id" - this will only produce unique pairs.
Nick Johnson
A: 

If you don't need to enforce referential integrity automatically (which you're not if you use string concatenation), use one table for the 85 items, give them each an index (0-84), and use a second table to represent a given set of items, using a numeric datatype where each bit position in the number represents one item. (e.g. 000001101 represents items 0, 2, and 3)

For items more than 64 you may have to split them up into more than one field, or use a BLOB or a string (gack!).

If you use this as a primary key field, you can enforce non-duplicates.

Jason S