views:

39

answers:

2

I have a table with fields:

  • id
  • name
  • parent_id
  • grandparent_id

I want to select the id, name, parent name, grandparent name.

Can I do this with a self join? I basically want to retrieve the "name" value where parent_id = id and return one row.

Example:

id     name       parent_id     grandparent_id
-----------------------------------------------
 1     Milton     NULL          NULL
 2     Year 3     1             NULL
 3     Class A    2             1

So i want to select the 3rd row (id = 3) but instead of returning just the parent_id and grandparent_id, i want the query to return the names of these records based on their ids. Can i create a composite field, say called parent_id_name and grandparent_id_name?

I'm pretty sure what i am doing can be achieved by a self join or sub query, but all of the code i have tried so far has failed to work. Any help would be really appreciated.

+1  A: 

This is the query that you asked for:

# By using LEFT JOINs you will be able to read any record,
# even one with missing parent/grand-parent...
SELECT
    child.id,
    child.name,
    parent.id,
    parent.name,
    gparent.id,
    gparent.name
FROM
    some_table child
    LEFT JOIN some_table parent ON
        parent.id = child.parent_id
    LEFT JOIN some_table gparent ON
        gparent.id = child.grandparent_id
WHERE
    child.id = 3

BUT I would also add that the redundancy of having a field grandparent_id does NOT sound right to me...

Your table should be just:

id     name       parent_id
 1     Milton     NULL
 2     Year 3     1
 3     Class A    2

Notice that, if I know that 1 is the parent of 2, I don't need repeat that same information again on record 3...

In this last case, your select could be like this:

SELECT
    child.id,
    child.name,
    parent.id,
    parent.name,
    gparent.id,
    gparent.name
FROM
    some_table child
    LEFT JOIN some_table parent ON
        parent.id = child.parent_id
    LEFT JOIN some_table gparent ON
        gparent.id = parent.parent_id  -- See the difference?
WHERE
    child.id = 3

The query would work the same, and you would also have more "normalized" database.

Edit: This is pretty basic stuff, but I guess it is relevant to this answer...

This kind of denormalization (i.e. to have both parent_id and grandparent_id on the same record) should not be used because it allows the database to be inconsistent.

For instance, let's suppose that a new record is inserted:

id     name            parent_id   grandparent_id
 1     Milton          NULL        NULL
 2     Year 3          1           NULL
 3     Class A         2           1
 4     Invalid Rec     2           3

It doesn't make any sense, right? Record 4 is stating that 3 is its grandparent. So, 3 should be the parent of record 2. But that's not what is stated on record 3 itself. Which record is right?

You may think this is an odd error, and that your database will never become like this. But my experience says otherwise - if an error may happen, it will eventually. Denormalization should be avoided, not just because some database guru says so, but because it really increases inconsistencies, and makes maintenance harder.

Of course, denormalized databases may be faster. But, as a rule of thumb, you should think about performance after your system is ready for production, and after you have perceived, by the means of some automated or empirical test, that a bottleneck exists. Believe me, I have seen much worse design choices being justified by wrong performance expectations before...

rsenna
I tend to agree. The denormalization can save a join though as in my answer.
Martin Smith
The reason for the parent and grandparent field is because reading on the net, people have said that recursion (just parent_id) is not very efficient, and the most efficient way (nested sets) is complex. So this "lineage" type structure is a happy medium. In my case I know that there will only be a max of 2 levels of hierarchy.
Dan Spencer
Also, with just parent_id, is it easy to select downwards on the hierarchy? e.g. if I want to get all of the child/grandchild records of record 1, how could this be achieved in the "normalized" model?
Dan Spencer
@Dan Spencer: uh... in exactly the same way? OK, there is a gotcha in your question - record 1 does NOT have a parent. In that case, you should change the INNER JOIN for a LEFT JOIN (or a LEFT OUTER JOIN, like Martin did).Again, please understand that denormalized databases are common, but IMO only are justified when performance is an issue. And many times this is a high price to pay for - because denormalization allows errors, and when a error may happen, it will happen. I will edit my answer and explain it better.
rsenna
@rsenna Thanks for your comments on this... I have now changed db structure as you suggested, only problem now is that mysql returns an empty result if there is no grandchild group - not all of the time I will have parent, child and grandchild. sometimes it will just be parent, child groups or even just parent. Is there a way i can get mysql to fail gracefully and just return to whatever level it can find? Thanks once again for all of your help.
Dan Spencer
@dan If I understood it right, you're just missing that `LEFT JOIN` I spoke before. I'm gonna update the example.
rsenna
+1  A: 
SELECT t1.name, 
    MAX(CASE WHEN t2.id = t1.parent_id then t2.name end) as Parent,
    MAX(CASE WHEN t2.id = t1.grandparent_id then t2.name end) as GrandParent
FROM your_table t1
LEFT OUTER JOIN your_table t2 ON t2.id IN (t1.parent_id, t1.grandparent_id)
WHERE t1.id = 3
group by t1.id,  t1.name
Martin Smith