views:

750

answers:

5

PHPWiki has a 5 second slow query each time you save a page edit. The query caught often in the "mysql-slow.log" is:

INSERT INTO wikiscore 
SELECT w1.topage, COUNT(*) 
FROM wikilinks AS w1, wikilinks AS w2 
WHERE w2.topage=w1.frompage 
GROUP BY w1.topage;

The current indexes are as follows:

table "wikilinks" has a primary index on "frompage" and "topage" 
table "wikiscore" has a primary index on "pagename" and "score"

How could I reformulate the SELECT query to return the same results faster? How could I change the indexes so this query would be faster? My thought is it might be OVER-indexed?

I've timed the result of the SELECT part of the query only and it takes 1-2 seconds alone. The INSERT must take up the rest of that time.

There is a lag when saving pages that I would like to eliminate. I do not have the option of upgrading to another wiki engine (or version of PHPwiki) due to the amount of modifications that have been done.

Any ideas?

edit---

The results of "EXPLAIN" on the SELECT part of the query were:

SIMPLE
w2
index
PRIMARY
204
31871   
Using index; Using temporary; Using filesort

SIMPLE
w1
ref
PRIMARY
PRIMARY
102 
phpwiki.w2.topage   
14
Using index
+2  A: 

It should be helpful to use the EXPLAIN statement to figure out what part of your query takes the most time. Then you can decide what measures are to be taken to optimize your query.

BloodySmartie
+3  A: 

table "wikilinks" has a primary index on "frompage" and "topage"

WHERE w2.topage=w1.frompage

This condition cannot be searched over the composite index described above.

Either change order (create an index on topage, frompage) or create an additional index on topage.

P. S. The root of them problem is that the ranks of each and every page in the system are updated with every edit.

This ranking system seems a little bit weird to me: it counts link to links, not the links themselves.

If 1000 pages link to Moscow and only Moscow links to Beket pond, then the pond will get 1000 points and Moscow will get no points at all, though everyone knows of Moscow and none of the pond.

I think it's not what you meant. Most probably it should look like that:

INSERT INTO
       wikiscore 
SELECT
       linked.topage, COUNT(*) AS cnt
FROM   wikilinks current, wikilinks linked
WHERE  current.frompage=@current_page
       AND linked.topage = current.topage
GROUP BY
       linked.topage
ON DUPLICATE KEY UPDATE
       score = cnt;

This will sum all links to all pages referenced from the current page, that seems to be what you want.

You will need to get rid of score in PRIMARY KEY on wikiscore in this case, but I see no point in putting it there anyway.

If you want to speed up ranking queries, you create indices like that:

ALTER TABLE wikilinks ADD CONSTRAINT pk_wikilinkes_fromto PRIMARY KEY (frompage, topage);

CREATE INDEX ix_wikilinks_topage ON wikilinks (topage);

ALTER TABLE wikiscore ADD CONSTRAINT pk_wikiscore_pagename PRIMARY KEY (pagename);

CREATE INDEX ix_wikiscore_score ON wikiscore (score);
Quassnoi
You need an index with topage as the leading column. It may permit duplicates and does not have to include fromage or any other column.
Jonathan Leffler
His EXPLAIN indicates that an index is begin used. It seems to me that the query could drive off of "w2" and use the index to look up rows in "w1" by frompage.
Dave Costa
I changed the index on wikilinks to "topage" and "frompage" and saw a faster query time on the SELECT portion. However the EXPLAIN on that one would seem to indicate that more rows are being looked at. I'm not sure what that means
jjclarkson
I made sure that a page was edited, so that no cache was in play, (I don't think).
jjclarkson
What about sorting by highest to lowest "score" won't that require an index on "score"?
jjclarkson
Sure it will, but not the way it is now. See updated post.
Quassnoi
+1  A: 
davethegr8
At this present time wikilinks has 31871 rows. But this query is creating a score of how many links are TO each page of all the pages. I plan to see if I can just query the links for the saved page only and update one row in the wikiscore table.
jjclarkson
+1  A: 

Quassnoi's answer will get you some speed on the SELECT. If the INSERT is taking another four seconds, then adding indexes isn't going to help anything. Possibly you could cut a lot of data out of the process by adding AND COUNT(*) > 0 to your SELECT, if it's desirable to leave out pages with zero incoming link counts.

You can get at least some improvement by removing indexes from wikiscore. Your primary key on pagename,score doesn't really make sense (you can store multiple scores from the same page, but not if they're the same score?), and should probably just be a primary key on pagename. If there are other indexes, you might be able to get rid of them.

If wikiscore isn't freshly created when this happens, you might get some benefit out of throwing an OPTIMIZE TABLE at it.

What would be really awesome, though, is if you changed the whole theory behind this query so that, instead of rebuilding the entire wikiscore table every time a page is saved, you only update the score of the saved page and pages that it links to.

chaos
Yes this makes complete sense. I will dig into the code and see how much it would take to just update the score for the saved page only.
jjclarkson
There was no overhead reported in phpmyadmin for either of the two tables.
jjclarkson
A: 

Here's how I modified the PHP code in PHPWiki's source

// update pagescore
//old way... 
/*     
mysql_query("DELETE FROM $WikiScoreStore", $dbi["dbc"]);
mysql_query("INSERT INTO $WikiScoreStore"
                 ." SELECT w1.topage, COUNT(*) FROM $WikiLinksStore AS w1, $WikiLinksStore AS w2"
                 ." WHERE w2.topage=w1.frompage GROUP BY w1.topage", $dbi["dbc"]);

*/

//delete this pagescore            
mysql_query("DELETE FROM $WikiScoreStore WHERE pagename='$frompage'", $dbi["dbc"]);
//insert just this pagescore
mysql_query("INSERT INTO $WikiScoreStore" 
           ." SELECT w1.topage, COUNT(*) FROM $WikiLinksStore AS w1, $WikiLinksStore AS w2"
          ." WHERE w2.topage=w1.frompage AND w1.topage='$frompage' GROUP BY w1.topage", $dbi["dbc"]);

Since this code change and the index tweaks, I have no slow queries. Thank you S.O.!

jjclarkson
Maybe you should point that solution out to the PHPWiki's maintainers, so they can apply this in the project.
Tiago