views:

16

answers:

1
+1  Q: 

Indexing SET field

I have two entities A and B. They are related with many to many relation. Entity A can be related up to 100 B entities. Entity B can be related up to 10000 A entities. I need quick way to select for example 30 A entities, that have relation with specified B entities, filtered and sorted by different attributes.

Here how I see ideal solution: I put all information I know about A entities, including their relations with B entities into single row (Special table with SET field) then add all necessary indexes. The problem is that you can't use index while querying by SET field. What should I do? I can replace database with something different, if that'll help.

UPDATE: I'm sorry. Looks like I've forgotten to mention one important detail. I need to find those A entries that have relations with B entry with id = 1 and with B entry with id = 2 at the same time. So if using joins I'll have something similar to:

SELECT a.id, count(*) as cnt FROM a INNER JOIN ab ON a.id = ab.a_id WHERE ab.b_id IN(1,2) GROUP BY a.id ORDER BY NULL having cnt = 2

Which gives me very bad perfomance

+1  A: 

Why don't you just do this:

SELECT  *
FROM    a
WHERE   a.id IN
        (
        SELECT  ab.a
        FROM    b
        JOIN    ab
        ON      ab.b = b.id
        WHERE   b.id IN (1, 2, 3, 4)
        )

and create a PRIMARY KEY on ab (b, a)?

Update:

Use this:

SELECT  *
FROM    a
WHERE   (
        SELECT  COUNT(*)
        FROM    ab
        WHERE   ab.a = a.id
                AND ab.b IN (1, 2, 3, 4)
        ) = 4
ORDER BY
        ...
LIMIT 30

or this:

SELECT  a.*
FROM    (
        SELECT  a
        FROM    ab
        WHERE   ab.b IN (1, 2, 3, 4)
        GROUP BY
                a
        HAVING  COUNT(*) = 4
        ) q
JOIN    a
ON      a.id = q.id
ORDER BY
        ...
LIMIT 30

You'll need to have a PRIMARY KEY on ab (b, a) (in this order) for this to work fast.

Which query is more efficient depends on your data distribution.

Quassnoi
I'm sorry. Looks like I've forgotten to mention one important detail. I need to find those A entries that have relations with B entry with id = 1 and with B entry with id = 2 at the same time. So if using joins I'll have something similar to:SELECT a.id, count(*) as cnt FROM a INNER JOIN ab ON a.id = ab.a_id WHERE ab.b_id IN(1,2) GROUP BY a.id ORDER BY NULL having cnt = 2Which gives me very bad perfomance
Dienow
This still has terrible performance. For some input values query takes up to 1 second to perform, which is unacceptable in my case. The problem is that entity from b can be connected with thousands of entities from a. That's why subquery has to process too many rows, and that gives quite bad performance even when all necessary data is taken from index (USING INDEX).
Dienow
@Dienow: could you please post the output of `EXPLAIN SELECT ...` along with output of `SHOW CREATE TABLE a` and `SHOW CREATE TABLE b`?
Quassnoi