views:

200

answers:

1

Set up(MySQL):

create table inRelation(
    party1 integer unsigned NOT NULL,
    party2 integer unsigned NOT NULL,
    unique (party1,party2)
);

insert into inRelation(party1,party2) values(1,2),(1,3),(2,3),(1,4),(2,5),(3,5),(1,6),(1,7),(2,7),(5,7);

mysql> select * from inRelation a
    -> join inRelation b on a.party2=b.party1
    -> join inRelation c on b.party2=c.party1
    -> where a.party1=1 and c.party2=7;
+--------+--------+--------+--------+--------+--------+
| party1 | party2 | party1 | party2 | party1 | party2 |
+--------+--------+--------+--------+--------+--------+
|      1 |      2 |      2 |      5 |      5 |      7 |
|      1 |      3 |      3 |      5 |      5 |      7 |
+--------+--------+--------+--------+--------+--------+
2 rows in set (0.00 sec)

mysql> explain select * from inRelation a
    -> join inRelation b on a.party2=b.party1
    -> join inRelation c on b.party2=c.party1
    -> where a.party1=1 and c.party2=7;
+----+-------------+-------+--------+---------------+--------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key    | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+---------------+--------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | b     | index  | party1        | party1 | 8       | NULL                |   10 | Using index |
|  1 | SIMPLE      | a     | eq_ref | party1        | party1 | 8       | const,news.b.party1 |    1 | Using index |
|  1 | SIMPLE      | c     | eq_ref | party1        | party1 | 8       | news.b.party2,const |    1 | Using index |
+----+-------------+-------+--------+---------------+--------+---------+---------------------+------+-------------+

This is a BFS solution for my previous post:

http://stackoverflow.com/questions/2076715/challenge-how-to-implement-an-algorithm-for-six-degree-of-separation

But what's the complexity of it?Suppose there are totally n records .

A: 

Assuming there are N vertices and E edges. For every table there can be a join between every pair of vertices and need to check all the vertices for equality. So worst case performance will be O(|V| + |E|)

Updated: If you are considering Mysql, there are lot of things that affect the complexity, if you have primary key index on the field, b-tree index will be used. If its a normal unclustered index, hash index will be used. There are different costs for each of these data structures.

From your other question, I see this is your requirements 1. Calculate the path from UserX to UserY 2. For UserX,calculate all users that is no more than 3 steps away.

For the first one, best thing is to apply djikstra algorithm and construct a table in java and then update it in the table. Note that, adding every new node, needs complete processing.

Other solution to this will be to use recursive SQL introduced in SQL 1999 standard to create a view containing the path from UserX to UserY. Let me know if you need some references for recursive queries.

For the second one, the query you have written works perfectly.

Algorist
djikstra has time complexity O(n^2),why do you say it's better than BFS in this case?IMHO,as this is to calculate shortest path in a **unweighted** graph,the best solution should be BFS.
Also,I think BFS is fit for both questions.And the SQL I posted above is actually to find path from `User1` to `User7` with path `length=3`
Djikstra is just doing the BFS but applying the greedy algorithm to store the best known path to every node. Its O(n^2) only for one source and all destinations, if you want to find for all sources and destinations you have to use All paris shortest algorithm by Floyd.
Algorist
Ok,what do you think of the performance of my posted sql solution to find all `length=3` paths between two users ?
To analyze this, lets assume that each table is having n rows and there are n^2 rows for each join. so, assuming N passes, it will be O(n^2(n-1)). If you created indexes, then join operation will be O(2n) for each table and for n-1 tables, it will be O(2n(n-1)).
Algorist
Why join will be O(2n) when indexes created?
When using index nested loops join, if the pages of one table fit in memory, then its just about pulling pages from other table and joining them. I am not sure of the complexity, if the first table doesn't fit in memory.
Algorist
So MySQL will handle this in O(n) for limited passes(in my case,6 at most)?
But also depends on the number of rows in each table as I said and also depends on whether you have index on the columns.
Algorist
There is a unique key :unique (party1,party2)
I don't think this will help, because unique implementation in the backend could be storing the hash of all the rows and checking if a new row matching any of the hash by using some binary search. You have to create indexes. Check mysql for creating indexes.Also, try writing and running the query on real tables. If you performance is too bad, I can help you optimize the query.
Algorist