views:

389

answers:

2

I'm implementing a memory cache for a table that looks like this (simplified):

Item1 (integer), Item2 (integer), cnt (Integer)

The original table includes millions of pairs like this. and it updates rapidly.

To make it all more efficient I want to write new pairs to an identical memory table and update the real table on disk periodically by cron.

The cron should do the following: for each pair if there is similar pair in the non-memory table increase the count by the count from the memory table. If no such pair exist create it with count from the memory table.

How can I make the flush (from memory table to real table) most efficient?

Notes: The environment is Mysql 5.0.45 PHP 5.2.6 CentOS

+3  A: 

You could use a INSERT ... ON DUPLICATE KEY UPDATE query - but that depends on the primary keys or UNIQUE indexes on your master table.

INSERT 
    INTO <<master_table>> (Item1, Item2, cnt)
    SELECT Item1, Item2, cnt FROM <<memory_table>>
    ON DUPLICATE KEY UPDATE cnt = cnt + VALUES(cnt);
Stefan Gehrig
Trying to edit my answer but it seems that because I deleted the anwser and undeleted it afterwards, SO prevents any further editing ("not found" error; perhaps a caching issue). The ON DUPLICATE KEY UPDATE should read ON DUPLICATE KEY UPDATE cnt = cnt + VALUES(cnt) if cnt is not always 1 in the memory table.
Stefan Gehrig
Thank you! it works. However something seems strange to me.The solution that worked for me is "INSERT INTO linked_items SELECT * FROM linked_items_mem as li_mem ON DUPLICATE KEY UPDATE linked_items.cnt=linked_items.cnt+li_mem.cnt;" However in my tests the memory table has 26 rows but mysql says 48 rows were affected. How come?
Nir
@Nir: Each INSERT and UPDATE operation counts separately. In your case there were 26 INSERT's, of which 22 failed and lead to 22 UPDATE's, thus making 26 + 22 = 48 operations.
Quassnoi
Also note that MySQL doesn't count UPDATE's which update nothing. UPDATE table SET cnt = cnt will always return 'no rows affected'. If your cnt is a 0 in one or your rows, then the row value will remain the same and this row won't contribute into the affected rows count.
Quassnoi
+2  A: 

Create a PRIMARY KEY on the disk table:

ALTER TABLE maintable ADD CONSTRAINT pk_maintable_item1_item2 (item1, item2)

and issue the following query:

INSERT
INTO    maintable
SELECT  item1, item2, COUNT(*) AS cnt
FROM    memtable mem
GROUP BY
        item1, item2
ON DUPLICATE KEY
UPDATE  cnt = maintable.cnt + mem.cnt

Note, however, that if you have lots of DISTINCT item1, item2 pairs, then the solution proposed by @S. Gehrig will most probably work better (due to overhead on GROUP BY)

Quassnoi
I like that idea of grouping the rows prior to insertion. +1
Stefan Gehrig
@S. Gehrig: It won't work without this. MySQL can update each row in a table at most once within a single update query, see this: bugs.mysql.com/bug.php?id=44494
Quassnoi
@Quassnoi: Thanks for that info. Deleted my answer as it won't work accoding to the bug report above.
Stefan Gehrig
@S. Gehrig: ummm... sorry, just checked, it's only applicable to an UPDATE query. INSERT ON DUPLICATE KEY UPDATE works fine. Restore your answer, I'll upvote it, since it many cases it will be more efficient than grouping :)
Quassnoi