views:

875

answers:

8

I have a SQL table like so:

Update: I'm changing the example table as the existing hierarchical nature of the original data (State, Cities, Schools) is overshadowing the fact that a simple relationship is needed between the items.

entities
id      name               
1       Apple     
2       Orange            
3       Banana             
4       Carrot                
5       Mushroom

I want to define two-way relationships between these entities so a user viewing one entity can see a list of all related entities.

The relationships are defined by an end user.

What is the best way to represent these relationships in the database and subsequently query and update them?

One way as I see it...

My instinct says a relationship table like so:

entity_entity
entity_id_a       entity_id_b
1                 2
5                 1
4                 1
5                 4
1                 3

That being the case, given a supplied entity_id of 4, how would one get all related records, which would be 1 and 5?

Likewise a query of entity_id = 1 should return 2, 3, 4, and 5.

Thanks for your time and let me know if I can clarify the question at all.

+1  A: 

I think the structure you have suggested is fine.

To get the related records do something like

SELECT related.* FROM entities AS search 
LEFT JOIN entity_entity map ON map.entity_id_a = search.id
LEFT JOIN entities AS related ON map.entity_id_b = related.id
WHERE search.name = 'Search term'

Hope that helps.

Tim Wardle
What if my search term matches an entity whose id occurs only in entity_id_b in the map?
Bill Karwin
In other words, your query works only if every relationship is stored twice, reverse. E.g. (1,4) and (4,1).
Bill Karwin
A: 
select * from entities
where entity_id in 
(
    select entity_id_b 
    from entity_entity 
    where entity_id_a = @lookup_value
)
Gordon Bell
+6  A: 

Define a constraint: entity_id_a < entity_id_b.

Create indexes:

CREATE UNIQUE INDEX ix_a_b ON entity_entity(entity_id_a, entity_id_b);
CREATE INDEX ix_b ON entity_entity(entity_id_b);

Second index doesn't need to include entity_id_a as you will use it only to select all a's within one b. RANGE SCAN on ix_b will be faster than a SKIP SCAN on ix_a_b.

Populate the table with your entities as follows:

INSERT
INTO entity_entity (entity_id_a, entity_id_b)
VALUES (LEAST(@id1, @id2), GREATEST(@id1, @id2))

Then select:

SELECT entity_id_b
FROM entity_entity
WHERE entity_id_a = @id
UNION ALL
SELECT entity_id_a
FROM entity_entity
WHERE entity_id_b = @id

UNION ALL here lets you use above indexes and avoid extra sorting for uniqueness.

All above is valid for a symmetric and anti-reflexive relationship. That means that:

  • If a is related to b, then b is related to a

  • a is never related to a

Quassnoi
This approach is working very well in practice. Thank you kindly.
GloryFish
A: 

I can think of a few ways.

A single pass with a CASE:

SELECT DISTINCT
    CASE
        WHEN entity_id_a <> @entity_id THEN entity_id_a
        WHEN entity_id_b <> @entity_id THEN entity_id_b
    END AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id OR entity_id_b = @entity_id

Or two filtered queries UNIONed thus:

SELECT entity_id_b AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id
UNION
SELECT entity_id_a AS equivalent_entity
FROM entity_entity
WHERE entity_id_b = @entity_id
Cade Roux
+1  A: 

The link table approach seems fine, except that you might want a 'relationship type' so that you know WHY they are related.

For example, the relation between Raleigh and North Carolina is not the same as a relation between Raleigh and Durham. Additionally, you may want to know who is the 'parent' in the relationship, in case you were driving conditional drop-downs. (i.e. You select a State, you get to see the cities that are in the state).

Depending on the complexity of your requirements, the simple setup you have right now may not be sufficient. If you simply need to show that two records are related in some way, the link table should be sufficient.

Jay S
I see what you are getting at. In this case we are specifically not representing a hierarchy. There will only ever be one state in this system and the relationships won't be used for a drill-down style navigation.
GloryFish
+1  A: 

I already posted a way to do it in your design, but I also wanted to offer this separate design insight if you have some flexibility in your design and this more closely fits your needs.

If the items are in (non-overlapping) equivalence classes, you might want to make equivalence classes the basis for the table design, where everything in class is considered equivalent. The classes themselves can be anonymous:

CREATE TABLE equivalence_class (
    class_id int -- surrogate, IDENTITY, autonumber, etc.
    ,entity_id int
)

entity_id should be unique for a non-overlapping partition of your space.

This avoids the problem of ensuring proper left- or right-handed-ness or forcing an upper-right relationship matrix.

Then your query is a little different:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> @entity_id

or, equivalently:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> c1.entity_id
Cade Roux
Nice! You can also test c2.entity_id <> c1.entity_id, instead of c2.entity_id <> @entity_id. That way you don't have to pass the @entity_id parameter twice.
Bill Karwin
I assumed it would be a stored procedure, but yes, that would be equivalent for the parameterized ad hoc query devotees.
Cade Roux
A: 

My advice is that your intial table design is bad. Do not store different types of things in the same table. (First rule of database design, right up there with do not store multiple pieces of information in the same field). This is much harder to query and will cause significant performance problems down the road. Plus it would be a problem entering the data into the realtionship table - how do you know what entities would need to be realted when you do a new entry? It would be much better to design properly relational tables. Entity tables are almost always a bad idea. I see no reason at all from the example to have this type of information in one table. Frankly I'd have a university table and a related address table. It would easy to query and perform far better.

HLGEM
A: 

Based on your updated schema this query should work:

select if(entity_id_a=:entity_id,entity_id_b,entity_id_a) as related_entity_id where :entity_id in (entity_id_a, entity_id_b)

where :entity_id is bound to the entity you are querying