views:

585

answers:

4

Hi everyone,

I have just made the update/add/delete part for the "Closure table" way of organizing query hierarchical data that are shown on page 70 in this slideshare: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back

My database looks like this:

Table Categories:

ID         Name
1          Top value
2          Sub value1

Table CategoryTree:

child     parent     level
1          1         0
2          2         0  
2          1         1  

However, I have a bit of an issue getting the full tree back as an multidimensional array from a single query.

Here's what I would like to get back:

 array (

 'topvalue' = array (
                     'Subvalue',
                     'Subvalue2',
                     'Subvalue3)
                     );

 );

Update: Found this link, but I still have a hard time to convert it into an array: http://karwin.blogspot.com/2010/03/rendering-trees-with-closure-tables.html

Update2 : I was able to add depths to each of the categories now, if that can be of any help.

A: 

Sorry but I don't think you can't get a multi-dimensional array out of your (or any) database query.

Alix Axel
Hi! check out my original post, I'd updated it with a link!
Industrial
+5  A: 

Proposed Solution

This following example gives a little more than you ask for, but it's a really nice way of doing it and still demonstrates where the information comes from at each stage.

It uses the following table structure:

+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| parent | int(10) unsigned | NO   |     | NULL    |                |
| name   | varchar(45)      | NO   |     | NULL    |                |
+--------+------------------+------+-----+---------+----------------+

Here it is:

<?php

    // Connect to the database
    mysql_connect('localhost', 'root', '');
    mysql_select_db('test');

    echo '<pre>';

    $categories = Category::getTopCategories();
    print_r($categories);

    echo '</pre>';

class Category
{
    /**
     * The information stored in the database for each category
     */
    public $id;
    public $parent;
    public $name;

    // The child categories
    public $children;

    public function __construct()
    {
        // Get the child categories when we get this category
        $this->getChildCategories();
    }

    /**
     * Get the child categories
     * @return array
     */
    public function getChildCategories()
    {
        if ($this->children) {
            return $this->children;
        }
        return $this->children = self::getCategories("parent = {$this->id}");
    }

    ////////////////////////////////////////////////////////////////////////////

    /**
     * The top-level categories (i.e. no parent)
     * @return array
     */
    public static function getTopCategories()
    {
        return self::getCategories('parent = 0');
    }

    /**
     * Get categories from the database.
     * @param string $where Conditions for the returned rows to meet
     * @return array
     */
    public static function getCategories($where = '')
    {
        if ($where) $where = " WHERE $where";
        $result = mysql_query("SELECT * FROM categories$where");

        $categories = array();
        while ($category = mysql_fetch_object($result, 'Category'))
            $categories[] = $category;

        mysql_free_result($result);
        return $categories;
    }
}

Test Case

In my database I have the following rows:

+----+--------+-----------------+
| id | parent | name            |
+----+--------+-----------------+
|  1 |      0 | First Top       |
|  2 |      0 | Second Top      |
|  3 |      0 | Third Top       |
|  4 |      1 | First Child     |
|  5 |      1 | Second Child    |
|  6 |      2 | Third Child     |
|  7 |      2 | Fourth Child    |
|  8 |      4 | First Subchild  |
|  9 |      4 | Second Subchild |
+----+--------+-----------------+

And thus the script outputs the following (lengthy) information:

Array
(
    [0] => Category Object
        (
            [id] => 1
            [parent] => 0
            [name] => First Top
            [children] => Array
                (
                    [0] => Category Object
                        (
                            [id] => 4
                            [parent] => 1
                            [name] => First Child
                            [children] => Array
                                (
                                    [0] => Category Object
                                        (
                                            [id] => 8
                                            [parent] => 4
                                            [name] => First Subchild
                                            [children] => Array
                                                (
                                                )

                                        )

                                    [1] => Category Object
                                        (
                                            [id] => 9
                                            [parent] => 4
                                            [name] => Second Subchild
                                            [children] => Array
                                                (
                                                )

                                        )

                                )

                        )

                    [1] => Category Object
                        (
                            [id] => 5
                            [parent] => 1
                            [name] => Second Child
                            [children] => Array
                                (
                                )

                        )

                )

        )

    [1] => Category Object
        (
            [id] => 2
            [parent] => 0
            [name] => Second Top
            [children] => Array
                (
                    [0] => Category Object
                        (
                            [id] => 6
                            [parent] => 2
                            [name] => Third Child
                            [children] => Array
                                (
                                )

                        )

                    [1] => Category Object
                        (
                            [id] => 7
                            [parent] => 2
                            [name] => Fourth Child
                            [children] => Array
                                (
                                )

                        )

                )

        )

    [2] => Category Object
        (
            [id] => 3
            [parent] => 0
            [name] => Third Top
            [children] => Array
                (
                )

        )

)

Example Usage

I'd suggest creating some kind of recursive function if you're going to create menus from the data:

function outputCategories($categories, $startingLevel = 0)
{
    $indent = str_repeat("    ", $startingLevel);

    foreach ($categories as $category)
    {
        echo "$indent{$category->name}\n";
        if (count($category->children) > 0)
            outputCategories($category->children, $startingLevel+1);
    }
}

$categories = Category::getTopCategories();
outputCategories($categories);

which would output the following:

First Top
    First Child
        First Subchild
        Second Subchild
    Second Child
Second Top
    Third Child
    Fourth Child
Third Top

Enjoy

icio
+2  A: 

Okay, I've written PHP classes that extend the Zend Framework DB table, row, and rowset classes. I've been developing this anyway because I'm speaking at PHP Tek-X in a couple of weeks about hierarchical data models.

I don't want to post all my code to Stack Overflow because they implicitly get licensed under Creative Commons if I do that. update: I committed my code to the Zend Framework extras incubator and my presentation is Models for Hierarchical Data with SQL and PHP at slideshare.

I'll describe the solution in pseudocode. I'm using zoological taxonomy as test data, downloaded from ITIS.gov. The table is longnames:

CREATE TABLE `longnames` (
  `tsn` int(11) NOT NULL,
  `completename` varchar(164) NOT NULL,
  PRIMARY KEY (`tsn`),
  KEY `tsn` (`tsn`,`completename`)
)

I've created a closure table for the paths in the hierarchy of taxonomy:

CREATE TABLE `closure` (
  `a` int(11) NOT NULL DEFAULT '0',  -- ancestor
  `d` int(11) NOT NULL DEFAULT '0',  -- descendant
  `l` tinyint(3) unsigned NOT NULL,  -- levels between a and d
  PRIMARY KEY (`a`,`d`),
  CONSTRAINT `closure_ibfk_1` FOREIGN KEY (`a`) REFERENCES `longnames` (`tsn`),
  CONSTRAINT `closure_ibfk_2` FOREIGN KEY (`d`) REFERENCES `longnames` (`tsn`)
)

Given the primary key of one node, you can get all its descendants this way:

SELECT d.*, p.a AS `_parent`
FROM longnames AS a
JOIN closure AS c ON (c.a = a.tsn)
JOIN longnames AS d ON (c.d = d.tsn)
LEFT OUTER JOIN closure AS p ON (p.d = d.tsn AND p.l = 1)
WHERE a.tsn = ? AND c.l <= ?
ORDER BY c.l;

The join to closure AS p is to include each node's parent id.

The query makes pretty good use of indexes:

+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref      | rows | Extra                       |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
|  1 | SIMPLE      | a     | const  | PRIMARY,tsn   | PRIMARY | 4       | const    |    1 | Using index; Using filesort |
|  1 | SIMPLE      | c     | ref    | PRIMARY,d     | PRIMARY | 4       | const    | 5346 | Using where                 |
|  1 | SIMPLE      | d     | eq_ref | PRIMARY,tsn   | PRIMARY | 4       | itis.c.d |    1 |                             |
|  1 | SIMPLE      | p     | ref    | d             | d       | 4       | itis.c.d |    3 |                             |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+

And given that I have 490,032 rows in longnames and 4,299,883 rows in closure, it runs in pretty good time:

+--------------------+----------+
| Status             | Duration |
+--------------------+----------+
| starting           | 0.000257 |
| Opening tables     | 0.000028 |
| System lock        | 0.000009 |
| Table lock         | 0.000013 |
| init               | 0.000048 |
| optimizing         | 0.000032 |
| statistics         | 0.000142 |
| preparing          | 0.000048 |
| executing          | 0.000008 |
| Sorting result     | 0.034102 |
| Sending data       | 0.001300 |
| end                | 0.000018 |
| query end          | 0.000005 |
| freeing items      | 0.012191 |
| logging slow query | 0.000008 |
| cleaning up        | 0.000007 |
+--------------------+----------+

Now I post-process the result of the SQL query above, sorting the rows into subsets according to the hierarchy (pseudocode):

while ($rowData = fetch()) {
  $row = new RowObject($rowData);
  $nodes[$row["tsn"]] = $row;
  if (array_key_exists($row["_parent"], $nodes)) {
    $nodes[$row["_parent"]]->addChildRow($row);
  } else {
    $top = $row;
  }
}
return $top;

I also define classes for Rows and Rowsets. A Rowset is basically an array of rows. A Row contains an associative array of row data, and also contains a Rowset for its children. The children Rowset for a leaf node is empty.

Rows and Rowsets also define methods called toArrayDeep() which dump their data content recursively as a plain array.

Then I can use the whole system together like this:

// Get an instance of the taxonomy table data gateway 
$tax = new Taxonomy();

// query tree starting at Rodentia (id 180130), to a depth of 2
$tree = $tax->fetchTree(180130, 2);

// dump out the array
var_export($tree->toArrayDeep());

The output is as follows:

array (
  'tsn' => '180130',
  'completename' => 'Rodentia',
  '_parent' => '179925',
  '_children' => 
  array (
    0 => 
    array (
      'tsn' => '584569',
      'completename' => 'Hystricognatha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '552299',
          'completename' => 'Hystricognathi',
          '_parent' => '584569',
        ),
      ),
    ),
    1 => 
    array (
      'tsn' => '180134',
      'completename' => 'Sciuromorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '180210',
          'completename' => 'Castoridae',
          '_parent' => '180134',
        ),
        1 => 
        array (
          'tsn' => '180135',
          'completename' => 'Sciuridae',
          '_parent' => '180134',
        ),
        2 => 
        array (
          'tsn' => '180131',
          'completename' => 'Aplodontiidae',
          '_parent' => '180134',
        ),
      ),
    ),
    2 => 
    array (
      'tsn' => '573166',
      'completename' => 'Anomaluromorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '573168',
          'completename' => 'Anomaluridae',
          '_parent' => '573166',
        ),
        1 => 
        array (
          'tsn' => '573169',
          'completename' => 'Pedetidae',
          '_parent' => '573166',
        ),
      ),
    ),
    3 => 
    array (
      'tsn' => '180273',
      'completename' => 'Myomorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '180399',
          'completename' => 'Dipodidae',
          '_parent' => '180273',
        ),
        1 => 
        array (
          'tsn' => '180360',
          'completename' => 'Muridae',
          '_parent' => '180273',
        ),
        2 => 
        array (
          'tsn' => '180231',
          'completename' => 'Heteromyidae',
          '_parent' => '180273',
        ),
        3 => 
        array (
          'tsn' => '180213',
          'completename' => 'Geomyidae',
          '_parent' => '180273',
        ),
        4 => 
        array (
          'tsn' => '584940',
          'completename' => 'Myoxidae',
          '_parent' => '180273',
        ),
      ),
    ),
    4 => 
    array (
      'tsn' => '573167',
      'completename' => 'Sciuravida',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '573170',
          'completename' => 'Ctenodactylidae',
          '_parent' => '573167',
        ),
      ),
    ),
  ),
)

Re your comment about calculating depth -- or really length of each path.

Assuming you've just inserted a new node to your table that holds the actual nodes (longnames in the example above), the id of the new node is returned by LAST_INSERT_ID() in MySQL or else you can get it somehow.

INSERT INTO Closure (a, d, l)
  SELECT a, LAST_INSERT_ID(), l+1 FROM Closure
  WHERE d = 5 -- the intended parent of your new node 
  UNION ALL SELECT LAST_INSERT_ID(), LAST_INSERT_ID(), 0;
Bill Karwin
Hi Bill! Thanks a lot for your extensive reply. Very appreciated! I wonder if you could provide an example on how to do inserts on this design, since I get strange foreign key issues when using your previous examples from the SQL antipattern slideshow. Thanks!
Industrial
Can you edit your question above and show what you've tried so far in terms of the specific insert and the error you got? Or maybe you could even open a separate question.
Bill Karwin
Hi Bill, No worries. I had written the query wrongly. A question though. Do you mind posting an example on how the insert query would look that calculates depth (l column) automatically?
Industrial
Hi again Bill. Noticed that you edited your post. Just wanted to thank you for your help! It's been amazing and I finally got the complete add/list/deletve/move/count Closure table up and running :)
Industrial
+1  A: 

Hi, I loved the answer from icio, but I prefer to have arrays of arrays, rather than arrays of objects. Here is his script modified to work without making objects:

<?php

require_once('mysql.php');

echo '<pre>';

$categories = Taxonomy::getTopCategories();
print_r($categories);

echo '</pre>';

class Taxonomy
{ 


public static function getTopCategories()
{
    return self::getCategories('parent_taxonomycode_id = 0');
}

public static function getCategories($where = '')
{
    if ($where) $where = " WHERE $where";
    $result = mysql_query("SELECT * FROM taxonomycode $where");

    $categories = array();
   // while ($category = mysql_fetch_object($result, 'Category'))
    while ($category = mysql_fetch_array($result)){
    $my_id = $category['id'];
    $category['children'] = Taxonomy::getCategories("parent_taxonomycode_id = $my_id");
            $categories[] = $category;
        }

    mysql_free_result($result);
    return $categories;
  }
 }

I think it fair to note that both my answer, and icios do not address your question directly. They both rely on having a parent id link in the main table, and make no use of the closure table. However, recursively querying the database is definitely the way to do, but instead of recursively passing the parent id, you have to pass in the parent id AND the level of the depth (which should increase by one on each recursion) so that the queries at each level can use parent + depth to get the direct parent information from the closure table rather than having it in the main table.

HTH, -FT

ftrotter