ansaurus

Question

MySQL query over many-to-many realtion: unions?

Answer 1

+2 A:

This still uses unions of sorts but may be easier to read and control. I am really interested on the speed of this query on a large data set, so please let me know how fast it is. When I put in your small data set it took 0.0001 secs.

SELECT DISTINCT (dt1.document_id)
FROM 
  document_tag dt1,
  (SELECT document_id
    FROM document_tag
    WHERE tag =  'tag1'
  ) AS t1s,
  (SELECT document_id
    FROM document_tag
    WHERE tag =  'tag2'
  ) AS t2s,
  (SELECT document_id
    FROM document_tag
    WHERE tag =  'tag3'
  ) AS t3s
WHERE
  (dt1.document_id = t1s.document_id
  AND dt1.document_id = t2s.document_id
  )
  OR dt1.document_id = t3s.document_id

This will make it easy to add new parameters because you have already specified the result set for each tag.

For example adding:

OR dt1.document_id = t2s.document_id

to the end will also pick up document_id 2

Justin Giboney 2009-07-30 15:08:47

Answer 2

A:

It's possible to do this within a single, however you'll need to promote your WHERE clause into the having clause in order to use a disjunctive.

Alex Gaynor 2009-07-30 15:10:32

Answer 3

A:

You're correct, that will get slower and slower as you add new tags you want to look for in additional UNION clauses. Each UNION clause is an additional query that needs to be planned and executed. Plus you won't be able to sort when you are done.

You're looking for a basic data warehousing technique. First, let me recreate your schema with one additional table.

create table a (document_id int, tag varchar(10));

insert into a values (1, 'tag1'), (1, 'tag2'), (1, 'tag3'), (2, 'tag2'), 
                     (3, 'tag1'), (3, 'tag2'), (4, 'tag1'), (5, 'tag3');

create table b (tag_group_id int, tag varchar(10));

insert into b values (1, 'tag1'), (1, 'tag2'), (2, 'tag3');

Table b contains "tag groups". Group 1 includes tag1 and tag2, while group 2 contains tag3.

Now you can modify table b to represent the query you are interested in. When you are ready to query, you create temp tables to store aggregate data:

create temporary table c 
(tag_group_id int, count_tags_in_group int, tags_in_group varchar(255));

insert into c
select 
    tag_group_id,
    count(tag),
    group_concat(tag)
from b
group by tag_group_id;

create temporary table d (document_id int, tag_group_id int, document_tag_count int);

insert into d
select
    a.document_id,
    b.tag_group_id,
    count(a.tag) as document_tag_count
from a
inner join b on a.tag = b.tag
group by a.document_id, b.tag_group_id;

Now c contains the number of tags for tag group, and d contains the number of tags each document has for each tag group. If a row in c matches a row in d, then that means that document has all of the tags in that tag group.

select 
    d.document_id as "Document ID",
    c.tags_in_group as "Matched Tag Group"
from d
inner join c on d.tag_group_id = c.tag_group_id
            and d.document_tag_count = c.count_tags_in_group

One cool thing about this approach is that you could run reports like 'How many documents have 50% or more of the tags in each of these tag groups?'

select 
    d.document_id as "Document ID",
    c.tags_in_group as "Matched Tag Group"
from d
inner join c on d.tag_group_id = c.tag_group_id
            and d.document_tag_count >= 0.5 * c.count_tags_in_group

mehaase 2010-02-27 00:07:05

ansaurus

tags:

views:

answers:

MySQL query over many-to-many realtion: unions?

related questions