views:

302

answers:

2

Hi all, I have 2 tables. Table 1 is 'articles' and Table 2 is 'article_categories'. When a user creates an article, it is stored into 'articles'. The user, while creating the article can select various categories under which this article can appear. Currently, an article can be selected to belong to anywhere from 10-25 categories(may be increased in future). These Categories under which the article is filed, are stored in 'article_categories'. So this means a single article ID can have multiple related values in table 'article_categories'. When retrieving all the values from both the tables, I would need the all the values from 'article_categories' to be pulled and the values to be stored in an array.

My question is about what SQL query to use in order to do so? Should I use Inner Join, Left Join, Outer Join...? What would be the best way to do that? I did try some of those joins in phpmyadmin and they give me repeating values of the same article, when in fact, the article should be fetched only once and all the related categories to be fetched. I wanted to do this all in the same query without having to split the query into 2 different in order to accomplish this. I am attaching my table structures so that it's easy for you:

CREATE TABLE IF NOT EXISTS `articles` (
  `article_id` int(11) unsigned NOT NULL auto_increment,
  `creator_id` int(11) unsigned NOT NULL,
  `article_title` varchar(150) NOT NULL,
  `article_status` varchar(10) NOT NULL,
  PRIMARY KEY  (`article_id`),
  KEY `buyer_id` (`creator_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

--
-- Dumping data for table `articles`
--

INSERT INTO `articles` (`article_id`, `creator_id`, `article_title`, `article_status`) VALUES
(1, 1, 'My article 1', 'Pending');


CREATE TABLE IF NOT EXISTS `article_categories` (
  `article_id` int(11) unsigned NOT NULL,
  `category_id` smallint(3) unsigned NOT NULL,
  PRIMARY KEY  (`article_id`,`category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--
-- Dumping data for table `article_categories`
--

INSERT INTO `article_categories` (`article_id`, `category_id`) VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 5),
(1, 36),
(1, 71);

Also please note that I have a composite key on the article_id and category_id keys in article_categories table. A sample query that I used is below:

SELECT *
FROM articles, article_categories 
WHERE articles.article_id = article_categories.article_id
AND articles.article_id = 1;

This results in:

article_id  creator_id article_title article_status article_id category_id
    1   1 My article 1 Pending 1 1
    1   1 My article 1 Pending 1 2
    1   1 My article 1 Pending 1 3
    1   1 My article 1 Pending 1 4
    1   1 My article 1 Pending 1 5
    1   1 My article 1 Pending 1 36
    1   1 My article 1 Pending 1 71

As can be seen, the value from the articles table is repeating and it's also able to get all the categories(it's the last column, in case the formatting is messed up). I wanted to get the values from the articles table only once and get the category_id in a loop, so that I can add those looped values in an array and carry on my processing. This is what I intend to do after fetching the values from above:

<?php
//i wanted to check if the article_id exists before i pull the related categories. 
//If I do it this way and output using mysql_num_rows, it gives me value 7,
//when in fact, the there's only 1 article with such Id. This means it also counts
//  the number of categories. Is there a way to uniquely identify only the number of
// articles (just to see if it exists or not, in the place)

$result = mysql_query("SELECT *
FROM articles, article_categories 
WHERE articles.article_id = article_categories.article_id
AND articles.article_id = 1");

while ( $rows = mysql_fetch_array($result) )
    { //i don't think this the following 2 assignments should be done in the loop
     $article_id = $rows['article_id'];
     $article_title = $rows['article_title'];

     //(loop all the category_id from the 2nd table and add to the array below)
     $categories_id[] .= ??????? --> How do i do this?  
    } 

?>

Obviously, I cannot do a LIMIT 1 on the above as that will limit my ability to retrieve all the category IDs.

So my question would be how do I get all the category_id from the 2nd table (in a loop) and add them to the array and at the same time, make sure that the values from table 1 are fetched only once (I do realize that the values fetched from the table 1 are the same but does not make sense to loop on them). To achieve this, I would like to get your input on what kind of Join I should use to execute the query with maximum efficiency and use the minimum resources, all in a single query to minimize hits on the DB. I hope that make sense.

Thanks in advance.

A: 

In this many-to-many scenario you describe, you can't avoid duplicate data in any single result set you get.

Here's an idea. Do a separate query to build an array of category names, with their database key as the array index.

$sql = "SELECT category_id, category_name FROM Categories";
$result = mysql_query($sql);
$arrCategories = array();
while ( $row = mysql_fetch_assoc($result) {
    $arrCategories[$row['category_id']] = $row['category_name'];
}

Now, you have the names of all the categories in an array.

When you're selecting articles, you'll have to do a separate query that pulls its category_ids from the join table. You can use a two-dimensional array to have a list of article ids and their associated categories

$arrArticleCategoryIds = array();

$result = mysql_query("SELECT *
FROM articles, article_categories 
WHERE articles.article_id = article_categories.article_id
AND articles.article_id = 1");

while ( $rows = mysql_fetch_array($result) )
    {   
        // why bother doing this when you've already hard-coded "1"?
        $article_id = $rows['article_id'];
        $article_title = $rows['article_title'];

        //(loop all the category_id from the 2nd table and add to the array below)
        // $categories_id[] .= ??????? --> How do i do this?               

        // here's how:
        $sql = "SELECT category_id 
                FROM `article_categories` 
                WHERE article_id = $article_id
        ";
        $category_id_results = mysql_query($sql);

        while ( $category_id_row = mysql_fetch_assoc($category_id_results) ) {
            $arrArticleCategoryIds[$article_id][] = $row['category_id'];
       }

    }

You'll wind up with two arrays:

$arrCategories 
Array
(
    [1] => apple
    [2] => banana
    ...
)

$arrArticleCategoryIds
Array
(
    [1] => Array
        (
            [1] => 13
            [2] => 9
         )
    [3] => Array
         (
            [1] => 5
            [2] => 7
         )
    )
)

Where '1' and '3' are article ids, and 13, 9, 5, and 7 are category ids that belong to the article id they're found under.

Hi, thank you for taking time to reply. Although I do appreciate your solution, my requirement slightly varies and it's missing the primary goal I am trying to achieve, deal with everything in a single query. I did think of such similar solution as yours and in fact, I would not have to even fetch rows in a while loop for the first query, nor use the complicated syntax, if I were to use 2 separate queries. I would just need to get the article Id and loop through the categories by that. Problem solved painlessly if it's a 2 query operation. But I wanted to get everything in 1 shot.Thanks again.
Devner
+1  A: 

EDIT:

SELECT articles.article_id, articles.article_title, GROUP_CONCAT(article_categories.category_id SEPARATOR ',') as category_id
FROM articles
LEFT JOIN article_categories ON  (articles.article_id = article_categories.article_id)
-- WHERE CLAUSE IF YOU WANT/NEED --
GROUP BY articles.article_id;

EDIT: Added column alias for group concat GROUP_CONCAT(article_categories.category_id SEPARATOR ',') as category_id

<?php
$result = mysql_query("SELECT articles.article_id, articles.article_title, GROUP_CONCAT(article_categories.category_id SEPARATOR ',') as category_id
FROM articles, article_categories
WHERE articles.article_id = 1
GROUP BY articles.article_id;");

while ( $rows = mysql_fetch_array($result) )
    {   
        $article_id = $rows['article_id'];
        $article_title = $rows['article_title'];

        //(loop all the category_id from the 2nd table and add to the array below)
        $categories_id = explode(',', $rows['category_id']);               
    }   

?>

Beware of group concat though as it does have limit:

The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value of max_allowed_packet.

EDIT: Also without using the group concat i would go ahead and do it the way you had it... just make the category id your primary looping stcuture:

<?php
$result = mysql_query("SELECT articles.article_id, articles.article_title, article_categories.category_id
FROM articles, article_categories
WHERE articles.article_id = 1");

$articles = array();
while ( $rows = mysql_fetch_array($result) )
    {   
        if(!array_key_exists($rows['article_id'], $articles)
        {
           $articles[$rows['article_id']] = array(
               'article_id' => $rows['article_id'],
               'article_title' => $rows['article_title']
               'categories_id' => array()
            );
         }

         $articles[$rows['article_id']][] = $rows['categories_id'];             
    }   

?>

This way you only query once, bur you would then have to loop over the article for the operation on the articles' data.

prodigitalson
Hi, Thank you for the reply and for your solution. So far, with the analysis done as per your solution, seems like it is going to provide me with my answer. When I test the SQL in phpmyadmin, it gives me the column name as GROUP_CONCAT(article_categories.category_id SEPARATOR ','), literally. When I print the array it gives me the following values:Array ( [0] => 1 [article_id] => 1 [1] => My article 1 [article_title] => My article 1 [2] => 1,2,3,4,5,36,71 [GROUP_CONCAT(article_categories.category_id SEPARATOR ',')] => 1,2,3,4,5,36,71 ) <to be continued>
Devner
<continued>$categories_id = explode(',', $rows['category_id']); does not seem to help. Nor does print_r($categories_id); Am I doing something wrong? print_r($rows); does show me that the values exist in comma separated format but I am unable to output it. This is the first of it's kind I have ever encountered. Your help in this is appreciated. Thank you.
Devner
sorry.. added the column alias in my edit.
prodigitalson
Hi, Yes, if we do apply the Alias, then it works. Thanks to you!!! I read that the limit for the concatenation is 1024. Does that mean it is 1MB limit or 1KB limit? Also I haven't run your second solution yet, but so far seems like it is one more solution, but yes, poses a problem of looping the article_id and article_title. Is there no way to avoid that? If you happen to have a solution at hand, please do let me know. Also I made an edit, not sure if you noticed it. I just wanted to confirm if the article_id exists before I begin processing. Hope your first solution does that.
Devner
You know im honestly not sure if those values correspond to bytes or kilobytes. Youll have to look that up in the manual. As far as checking if an article id exists... when you wite a join in that syntax it will only return rows that have matching article_id values... so in order to return a category an article id has to exists. so if `mysql_num_rows` retruns anything other than 0 it exists.
prodigitalson
You are right on! Thanks much.
Devner
I looked up the bytes/KB thing for the GROUP_CONCAT() function. The value is in bytes. The following is the extract from reference manual:"The maximum allowed result length in bytes for the GROUP_CONCAT() function. The default is 1024."Hope this helps anyone looking for that info.
Devner
@prodigitalson: Here's something weird I found out. I emptied the article_categories table and ran our query. Although the article_id exists in articles table, just because the query could not find matching rows in the article_categories table, it reported the number of rows as 0, which is completely false (as we still have our article in our DB). Just because the categories aren't available, doesn't mean the Article does not exist. I think we will have to revise our query to make this work correctly. Can we use LEFT JOIN or anything else to make this work? What do you say?
Devner
LEFT JOIN would work. As i said the syntax we used before is like running a normal JOIN in which both sides of the condition must be met to return a record. Have updated with left join.
prodigitalson
Thanks, that did the trick! +1 for you.
Devner