ansaurus

Question

What is the best way to represent a many-to-many relationship between records in a single SQL table?

Answer 1

+1 A:

I think the structure you have suggested is fine.

To get the related records do something like

SELECT related.* FROM entities AS search 
LEFT JOIN entity_entity map ON map.entity_id_a = search.id
LEFT JOIN entities AS related ON map.entity_id_b = related.id
WHERE search.name = 'Search term'

Hope that helps.

Tim Wardle 2009-01-23 19:25:40

What if my search term matches an entity whose id occurs only in entity_id_b in the map?

Bill Karwin 2009-01-23 20:49:10

In other words, your query works only if every relationship is stored twice, reverse. E.g. (1,4) and (4,1).

Bill Karwin 2009-01-23 20:50:49

Answer 2

A:

select * from entities
where entity_id in 
(
    select entity_id_b 
    from entity_entity 
    where entity_id_a = @lookup_value
)

Gordon Bell 2009-01-23 19:26:16

Answer 3

+6 A:

Define a constraint: entity_id_a < entity_id_b.

Create indexes:

CREATE UNIQUE INDEX ix_a_b ON entity_entity(entity_id_a, entity_id_b);
CREATE INDEX ix_b ON entity_entity(entity_id_b);

Second index doesn't need to include entity_id_a as you will use it only to select all a's within one b. RANGE SCAN on ix_b will be faster than a SKIP SCAN on ix_a_b.

Populate the table with your entities as follows:

INSERT
INTO entity_entity (entity_id_a, entity_id_b)
VALUES (LEAST(@id1, @id2), GREATEST(@id1, @id2))

Then select:

SELECT entity_id_b
FROM entity_entity
WHERE entity_id_a = @id
UNION ALL
SELECT entity_id_a
FROM entity_entity
WHERE entity_id_b = @id

UNION ALL here lets you use above indexes and avoid extra sorting for uniqueness.

All above is valid for a symmetric and anti-reflexive relationship. That means that:

If a is related to b, then b is related to a
a is never related to a

Quassnoi 2009-01-23 19:30:42

This approach is working very well in practice. Thank you kindly.

GloryFish 2009-02-02 16:34:36

Answer 4

A:

I can think of a few ways.

A single pass with a CASE:

SELECT DISTINCT
    CASE
        WHEN entity_id_a <> @entity_id THEN entity_id_a
        WHEN entity_id_b <> @entity_id THEN entity_id_b
    END AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id OR entity_id_b = @entity_id

Or two filtered queries UNIONed thus:

SELECT entity_id_b AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id
UNION
SELECT entity_id_a AS equivalent_entity
FROM entity_entity
WHERE entity_id_b = @entity_id

Cade Roux 2009-01-23 19:31:43

Answer 5

+1 A:

The link table approach seems fine, except that you might want a 'relationship type' so that you know WHY they are related.

For example, the relation between Raleigh and North Carolina is not the same as a relation between Raleigh and Durham. Additionally, you may want to know who is the 'parent' in the relationship, in case you were driving conditional drop-downs. (i.e. You select a State, you get to see the cities that are in the state).

Depending on the complexity of your requirements, the simple setup you have right now may not be sufficient. If you simply need to show that two records are related in some way, the link table should be sufficient.

Jay S 2009-01-23 19:35:12

I see what you are getting at. In this case we are specifically not representing a hierarchy. There will only ever be one state in this system and the relationships won't be used for a drill-down style navigation.

GloryFish 2009-01-23 19:42:11

Answer 6

+1 A:

I already posted a way to do it in your design, but I also wanted to offer this separate design insight if you have some flexibility in your design and this more closely fits your needs.

If the items are in (non-overlapping) equivalence classes, you might want to make equivalence classes the basis for the table design, where everything in class is considered equivalent. The classes themselves can be anonymous:

CREATE TABLE equivalence_class (
    class_id int -- surrogate, IDENTITY, autonumber, etc.
    ,entity_id int
)

entity_id should be unique for a non-overlapping partition of your space.

This avoids the problem of ensuring proper left- or right-handed-ness or forcing an upper-right relationship matrix.

Then your query is a little different:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> @entity_id

or, equivalently:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> c1.entity_id

Cade Roux 2009-01-23 19:40:21

Nice! You can also test c2.entity_id <> c1.entity_id, instead of c2.entity_id <> @entity_id. That way you don't have to pass the @entity_id parameter twice.

Bill Karwin 2009-01-23 20:46:43

I assumed it would be a stored procedure, but yes, that would be equivalent for the parameterized ad hoc query devotees.

Cade Roux 2009-01-23 21:04:32

Answer 7

A:

My advice is that your intial table design is bad. Do not store different types of things in the same table. (First rule of database design, right up there with do not store multiple pieces of information in the same field). This is much harder to query and will cause significant performance problems down the road. Plus it would be a problem entering the data into the realtionship table - how do you know what entities would need to be realted when you do a new entry? It would be much better to design properly relational tables. Entity tables are almost always a bad idea. I see no reason at all from the example to have this type of information in one table. Frankly I'd have a university table and a related address table. It would easy to query and perform far better.

HLGEM 2009-01-23 19:53:28

Answer 8

A:

Based on your updated schema this query should work:

select if(entity_id_a=:entity_id,entity_id_b,entity_id_a) as related_entity_id where :entity_id in (entity_id_a, entity_id_b)

where :entity_id is bound to the entity you are querying

2009-01-23 21:08:52

ansaurus

tags:

views:

answers:

What is the best way to represent a many-to-many relationship between records in a single SQL table?

related questions