views:

134

answers:

2

My database schema looks like this:

Table t1:
    id
    valA
    valB

Table t2:
    id
    valA
    valB

What I want to do, is, for a given set of rows in one of these tables, find rows in both tables that have the same valA or valB (comparing valA with valA and valB with valB, not valA with valB). Then, I want to look for rows with the same valA or valB as the rows in the result of the previous query, and so on.

Example data:

t1 (id, valA, valB):
    1, a, B
    2, b, J
    3, d, E
    4, d, B
    5, c, G
    6, h, J

t2 (id, valA, valB):
    1, b, E
    2, d, H
    3, g, B


Example 1:

Input: Row 1 in t1
Output: 
    t1/4, t2/3
    t1/3, t2/2
    t2/1
    ...


Example 2:

Input: Row 6 in t1
Output:
    t1/2
    t2/1

I would like to have the level of the search at that the row was found in the result (e.g. in Example 1: Level 1 for t1/2 and t2/1, level 2 for t1/5, ...) A limited depth of recursion is okay. Over time, I maybe want to include more tables following the same schema into the query. It would be nice if it was easy to extend the query for that purpose.

But what matters most, is the performance. Can you tell me the fastest possible way to accomplish this?

Thanks in advance!

+2  A: 

You can do it with stored procedures (see listings 7 and 7a):

http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

You just need to figure out a query for the step of the recursion - taking the already-found rows and finding some more rows.

If you had a database which supported SQL-99 recursive common table expressions (like PostgreSQL or Firebird, hint hint), you could take the same approach as in the above link, but using a rCTE as the framework, so avoiding the need to write a stored procedure.

EDIT: I had a go at doing this with an rCTE in PostgreSQL 8.4, and although i can find the rows, i can't find a way to label them with the depth at which they were found. First, i create a a view to unify the tables:

create view t12 (tbl, id, vala, valb) as (
  (select 't1', id, vala, valb from t1)
  union
  (select 't2', id, vala, valb from t2)
)

Then do this query:

with recursive descendants (tbl, id, vala, valb) as (
  (select *
  from t12
  where tbl = 't1' and id = 1) -- the query that identifies the seed rows, here just t1/1
  union
  (select c.*
  from descendants p, t12 c
  where (p.vala = c.vala or p.valb = c.valb)) -- the recursive term
)
select * from descendants;

You would imagine that capturing depth would be as simple as adding a depth column to the rCTE, set to zero in the seed query, then somehow incremented in the recursive step. However, i couldn't find any way to do that, given that you can't write subqueries against the rCTE in the recursive step (so nothing like select max(depth) + 1 from descendants in the column list), and you can't use an aggregate function in the column list (so no max(p.depth) + 1 in the column list coupled with a group by c.* on the select).

You would also need to add a restriction to the query to exclude already-selected rows; you don't need to do that in the basic version, because of the distincting effect of the union, but if you add a count column, then a row can be included in the results more than once with different counts, and you'll get a Cartesian explosion. But you can't easily prevent it, because you can't have subqueries against the rCTE, which means you can't say anything like and not exists (select * from descendants d where d.tbl = c.tbl and d.id = c.id)!

I know all this stuff about recursive queries is of no use to you, but i find it riveting, so please do excuse me.

Tom Anderson
+1  A: 

try this although it's not fully tested but looked like it was working :P (http://pastie.org/1140339)

drop table if exists t1;
create table t1
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;

drop table if exists t2;
create table t2
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;

drop view if exists t12;
create view t12 as
select 1 as tid, id, valA, valB from t1
union
select 2 as tid, id, valA, valB from t2;

insert into t1 (valA, valB) values 
('a','B'),
('b','J'),
('d','E'),
('d','B'),
('c','G'),
('h','J');

insert into t2 (valA, valB) values 
('b','E'),
('d','H'),
('g','B');

drop procedure if exists find_children;

delimiter #

create procedure find_children
(
in p_tid tinyint unsigned,
in p_id int unsigned 
)
proc_main:begin

declare done tinyint unsigned default 0;
declare dpth smallint unsigned default 0;


create temporary table children(
 tid tinyint unsigned not null,
 id int unsigned not null,
 valA char(1) not null,
 valB char(1) not null,
 depth smallint unsigned default 0,
 primary key (tid, id, valA, valB)
)engine = memory;

insert into children select p_tid, t.id, t.valA, t.valB, dpth from t12 t where t.tid = p_tid and t.id = p_id; 

create temporary table tmp engine=memory select * from children;

/* http://dec.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */

while done <> 1 do

    if exists(
    select 1 from t12 t 
      inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth) then

        insert ignore into children
        select 
        t.tid, t.id, t.valA, t.valB, dpth+1 
      from t12 t
      inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth;

        set dpth = dpth + 1;            

        truncate table tmp;
        insert into tmp select * from children where depth = dpth;

    else
        set done = 1;
    end if;

end while;

select * from children order by depth;

drop temporary table if exists children;
drop temporary table if exists tmp;

end proc_main #


delimiter ;


call find_children(1,1);

call find_children(1,6);
f00
a non recursive method that isn't as performant, hmmmm.....
f00
WOW! Thanks so much for your answer! It really helped me a lot. The performance is okay. I was executing a separate query for each recursion step before and doing the in-between work in PHP, so it already means a huge increase in performance :) There remains only one (probably easy) question: Is it possible to call this function on a set of tid/id pairs at once? The result should then be displayed all in one rowset.
eWolf
is everything indexed correctly ?? maybe you should post an explain plan so we can see what's going on :) You can pass in as many tid/id params as you like - just simply insert them into the temp children table with depth 0 and it will work from there (if that's what you mean ??)
f00
I'm pretty new to stored procedures, my question probably is more basic than you think ;-) I know how to insert the items into the children table.. I meant to ask if/how it is possible to pass multiple tid/id pairs to the procedure. I know I could do it with a temporary table, but I'd prefer a simpler solution.
eWolf
you'd have to pass in a delimited string such as: t1,1 ; t2,3 ; t1,2 and split it into tokens and insert each into the children table at depth 0 (sucks i know but MySQL has a long way to grow just yet)
f00
@f00 Can you tell me how exactly to do that? Seems like MySQL doesn't even have a function to split a string.. Creepy :-/
eWolf