views:

51

answers:

1

I am operating on a (not so) large graph having about 380K edges. I wrote a program to count the number of 3-cliques in the graph. A quick example:

List of edges:
A - B
B - C
C - A
C - D

List of cliques:
A - B - C

MySQL Table structure:

+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| v1    | bigint(20) | YES  | MUL | NULL    |       | 
| v2    | bigint(20) | YES  | MUL | NULL    |       | 
+-------+------------+------+-----+---------+-------+

A 3-clique is nothing but a triangle in a graph. Currently, I am doing this using PHP+MySQL. As expected, it is not fast enough. Is there a way to do this in pure MySQL? (perhaps a way to insert all 3-cliques into a table?)

+3  A: 
SELECT T1.v1, T2.v1, T3.v1 FROM TableName T1, TableName T2, TableName T3
WHERE T1.v1 < T1.v2 AND T2.v1 < T2.v2 AND T3.v1 < T3.v2
AND T1.v1 = T3.v1 AND T1.v2 = T2.v1 AND T2.v2 = T3.v2

Should do the trick. What I've done there is ensure that v1 is less than v2 for all edges considered, simply to remove duplicates. Then it is a simple matter of joining up the edges by their start/end points. Returning the first point in each of the pairs.

If you have edges going from a node back to the same node you may need to add in extra checks as appropriate.

EDIT: Made a change, thanks to Legend.Reminded me that we needed to make sure that the edge found in T3 matches up to the edge in T1, so we have to link the first in each together! Originally I had T3.v1 > T3.v2 in the first line of the where clause but changed it to reduce confusion, forgot to change the second part though!

Farthingworth
@Farthinworth: Thanks for your time. For some reason, it gives me an empty set. And I don't think I have the situation you described at the end. Do you mind explaining that a little more? EDIT: I just tried it on a smaller table as well. Will look into the statement if there's anything that I am missing.
Legend
Adding as a new comment because this worked on a test table: SELECT T1.v1, T2.v1, T3.v2 FROM cycletest1 T1, cycletest1 T2, cycletest1 T3 WHERE T1.v1 < T1.v2 AND T2.v1 < T2.v2 AND T3.v1 < T3.v2 AND T1.v1 = T3.v1 AND T1.v2 = T2.v1 AND T2.v2 = T3.v2; I wonder how long it will take to run though :)
Legend
@Farthingworth: Thanks for fixing it and more than that, thanks a lot for the idea. I was thinking my PHP+MySQL approach would take days to finish! :)
Legend
T1.v1 = T3.v1 AND T1.v2 = T2.v1 AND T2.v2 = T3.v2should beT1.v1 = T3.v2 AND T1.v2 = T2.v1 AND T2.v2 = T3.v1
Fakrudeen