views:

388

answers:

6

I have four tables

create table entities{
integer id;
string name;
}

create table users{
integer id;//fk to entities
string email;
}

create table groups{
integer id;//fk to entities
}

create table group_members{
integer group_id; //fk to group
integer entity_id;//fk to entity
}

I want to make a query that returns all groups where a user belongs, directly or indirectly. The obvious solution is to make a recursion at the application level. I’m wondering what changes can I make to my data model to decrease the database access and as a result have a better performance.

+1  A: 

Can you clarify the difference between an entity and a user? Otherwise, your tables look OK. You are making an assumption that there is a many-to-many relationship between groups and entities.

In any case, with standard SQL use this query:

SELECT name, group_id
FROM entities JOIN group_members ON entities.id = group_members.entity_id;

This will give you a list of names and group_ids, one pair per line. If an entity is a member of multiple groups, the entity will be listed several times.

If you're wondering why there's no JOIN to the groups table, it's because there's no data from the groups table that isn't already in the group_members table. If you included, say, a group name in the groups table, and you wanted that group name to be shown, then you'd have to join with groups, too.

Some SQL variants have commands related to reporting. They would allow you to list multiple groups on the same line as a single entity. But it's not standard and wouldn't work across all platforms.

Barry Brown
A: 

If you want a truly theoretically infinite level of nesting, then recursion is the only option, which precludes any sane version of SQL. If you're willing to limit it, then there are a number of other options.

Check out this question.

Adam Robinson
There categorically ARE ways of representing trees without needing Recursion to interrogate them. They just need a little 'out of the box thinking', and in some cases a good mathmatical mind. Search for "Nested Sets" and if you keep reading what you find you'll find other possibilities too...
Dems
@Dems: That's why I prefaced this saying if you truly need a theoretically infinite level of nesting. All of these approaches are compromises on the theoretical set in favor of ease of querying. Stating that there "categorically ARE ways" is meaningless. There are ways, but none of them completely satisfies the condition, and the OP provided no information that allowed a compromise to be selected.
Adam Robinson
A: 

Hi,

You can do the following:

  • Use the START WITH / CONNECT BY PRIOR constructs.
  • Create a PL/SQL function.
ATorras
+10  A: 

In Oracle:

SELECT  group_id
FROM    group_members
START WITH
        entity_id = :user_id
CONNECT BY
        entity_id = PRIOR group_id

In SQL Server:

WITH    q AS
        (
        SELECT  group_id, entity_id
        FROM    group_members
        WHERE   entity_id = @user_id
        UNION ALL
        SELECT  gm.group_id, gm.entity_id
        FROM    group_members gm
        JOIN    q
        ON      gm.entity_id = q.group_id
        )
SELECT  group_id
FROM    q

In PostgreSQL 8.4:

WITH RECURSIVE
        q AS
        (
        SELECT  group_id, entity_id
        FROM    group_members
        WHERE   entity_id = @user_id
        UNION ALL
        SELECT  gm.group_id, gm.entity_id
        FROM    group_members gm
        JOIN    q
        ON      gm.entity_id = q.group_id
        )
SELECT  group_id
FROM    q

In PostgreSQL 8.3 and below:

CREATE OR REPLACE FUNCTION fn_group_members(INT)
RETURNS SETOF group_members
AS
$$
        SELECT  group_members
        FROM    group_members
        WHERE   entity_id = $1
        UNION ALL
        SELECT  fn_group_members(group_members.group_id)
        FROM    group_members
        WHERE   entity_id = $1;
$$
LANGUAGE 'sql';

SELECT  group_id
FROM    group_members(:myuser) gm
Quassnoi
Very elegant solutions indeed, but just as a point of fact the OP had asked if it was possible *without* recursion. Your solution clearly appears functional and is fairly straightforward, but it does still use recursion.
Adam Robinson
From the question: "The obvious solution is to make a recursion at the *application level*". I think that's what the `@op` really wanted to avoid, not the recursion as such.
Quassnoi
Tks for your answer!. Although the solution still use a recursion, this approach is more efficient than writing the recursion at the application level. I just need to upgrade my postgres version :D
Dani Cricco
`@Dani`: with a little effort you can implement this on `8.3` and below just as well.
Quassnoi
`@Dani`: see the post update
Quassnoi
@Quassnoi: thank you very much for your quick and helpfull answer!
Dani Cricco
+3  A: 

There are ways of avoiding recursion in tree hierarchy queries (in opposition to what people have said here).

The one I've used most is Nested Sets.

As with all life and technical decisions, however, there are trade offs to be made. Nested Sets are often slower to update but much faster to query. There are clever and complicated ways of improving the speed of updating the hierarchy, but there's another trade-off; performance vs code complexity.

A simple example of a nested set...

Tree View:

 -Electronics
 |
 |-Televisions
 | |
 | |-Tube
 | |-LCD
 | |-Plasma
 |
 |-Portable Electronics
   |
   |-MP3 Players
   | |
   | |-Flash
   |
   |-CD Players
   |-2 Way Radios

Nested Set Representation

+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  20 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  19 |
|           7 | MP3 PLAYERS          |  11 |  14 |
|           8 | FLASH                |  12 |  13 |
|           9 | CD PLAYERS           |  15 |  16 |
|          10 | 2 WAY RADIOS         |  17 |  18 |
+-------------+----------------------+-----+-----+

You'll want to read the article I linked to understand this fully, but I'll try to give a short explanation.

An item is a member of another item if (the child's "lft" (Left) value is greater than the parent's "ltf" value) AND (the child's "rgt" value is less than the parent's "rgt" value)

"Flash" is therfore a member of "MP3 PLAYERS", "Portable Electronics" and "Electronics"

Or, conversley, the members of "Portable Electronics" are:
- MP3 Players
- Flash
- CD Players
- 2 Way Radios

Joe Celko has an entire book on "Trees and Hierarchies in SQL". There are more options than you think, but lots of trade off's to make.

Note: Never say something can't be done, some mofo will turn up to show you that in can.

Dems
`Nested sets` is indeed faster to query when you want to find all items within a category, but it's slower when you want all categories an item belongs to (which is the functionality the `@op` asks for).
Quassnoi
Okay, well your name I recognise and respect, but are you Certain on that one? Nested sets perform fasted when looking down the tree (what are my child) and are slower when looking up the tree (what are my parents). But still, in my experience looking up the tree is faster in Nested Sets than using recusion, even with Common Table Expressions in SQL Server 2005+. I'd be genuinely interested in any articles, etc you have to demonstrate that the difference is the other way around.
Dems
`@Dems`: it's a good idea on writing an article on this (I'll probably do it on this week). Just some outlines: when you search for all categories a child belongs to, you need to issue this query: `SELECT * FROM sets WHERE lft <= @myid and rgt >= @myid`. No single index can serve this query. You will need an `INDEX MERGE` on two indexes which will need to filter possibly thousands of records and then join them. And trees with `100,000` categories are common. `Adjacency list`, on the other side, requires at most as many index lookups as is the depth of the item, which is rarely more than `10`.
Quassnoi
@Dems: Clearly you have confidence in your abilities. What I said is true, as a nested set isn't an actual tree data structure. It is functionally similar (and in many cases identical), but it is not a tree and has tradeoffs, as you admit in the answer. No need to put down another answer just because you don't like it.
Adam Robinson
A: 

I don't think there is a need for recursion here as the solution posted by barry-brown seems adequate. If you need a group to be able to be a member of a group, then the tree traversal method offered by Dems works well. Inserts, deletes and updates are pretty straightforward with this scheme, and retrieving the entire hierarchy is accomplished with a single select.

I would suggest including a parent_id field in your group_members table (assuming that is the point at which your recursive relationship occurs). In a navigation editor I've created a nodes table like so:

tbl_nodes     
----------
node_id   
parent_id 
left      
right
level

...

My editor creates hierarchically-related objects from a C# node class

    class node {
      public int NodeID { get; set; } 
      public Node Parent { get; set; }
      public int Left { get; set; }
      public int Right { get; set; }
      public Dictionary<int,Node> Nodes { get; set; } 
      public int Level {
         get { 
            return (Parent!=null) ? Parent.Level+1 : 1;
         }
      }
}

The Nodes property contains a list of child nodes. When the business layer loads the hierarchy, it rectifies the parent/child relationships. When the nav editor saves, I recursively set the left and right property values, then save to the database. That lets me get the data out in the correct order meaning I can set parent/child references during retrieval instead of having to make a second pass. Also means that anything else that needs to display the hierarchy ( say, a report) can easily get the node list out in the correct order.

Without a parent_id field, you can retrieve a breadcrumb trail to the current node with

select n1.* 
from nodes n1, nodes n2
where d1.lft <= d2.lft and d1.rgt >= d2.rgt
and d2.id = @id
order by lft;

where @id is the id of the node you're interested in.

Pretty obvious stuff, really, but it applies to items such as nested group membership that might not be obvious, and as others have said eliminates the need to slow recursive SQL.

David Lively