ansaurus

Question

Counting records belonging to a category in MySQL

Answer 1

A:

Let me suggest you to restructure the schema instead. What you want here is to represent a hierarchical structure (categories), which is not really straightforward to do with relational databases. Two common solutions are the adjacency list and the nested set.

The adjacency list is more a straightforward tree-like structure. You'll have a categories table like:

id  | name      | parent
------------------------
1   | Science   | null
2   | Sports    | null
3   | Medicine  | 1

Unfortunately this model is hard to work with using SQL. Instead, we can the nested set approach. Here every node has lft and rgt values node which will be between the parent's lft and rgt values. In your example you'll have:

id  | name      | lft  | rgt  
-------------------------------
1   | Science   | 1    | 4    
2   | Sports    | 5    | 6 
3   | Medicine  | 2    | 3

So in order to retrieve a count for a certain category, you can simply query the count of nodes that have a lft and rgt value in between the category you want. For example:

   SELECT COUNT(*) 
     FROM articles a
LEFT JOIN categories c ON a.category_id = c.id
    WHERE lft BETWEEN 1 AND 4 
      AND rgt BETWEEN 1 AND 4

Assuming your article table looks like:

id  | ... | category_id

This is discussed in more detail at:
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

NullUserException 2010-09-04 13:49:32

Nested sets are clever and cool, but a pain to work with. I'd suggest the OP make very sure he needs what they can give him before he changes anything.

Tom Anderson 2010-09-04 13:54:27

Answer 2

A:

I'll propose another solution: use tags rather than categories. You can use multiple tags for a given article and simply get the count of all articles matching a certain tag. This will be a lot easier to work with and also give you a lot more flexibility.

To accomplish this, you'll need a many-to-many relationship between articles and tags, which is usually implemented with a junction table:

tags
id  | name

articles_tags # the junction table
article_id  | tag_id

To tag an article, you simply INSERT multiple entries into the articles_tags table with the correct article_id and tag_id. Then you can use JOINs as usual to get what you want.

NullUserException 2010-09-04 14:07:19

Answer 3

A:

Add a column to Categories which gives the main category that each category is in (with main categories giving themselves). So:

cat_id | main_cat_id | title
-------+-------------+---------
01     | 01          | Science
0101   | 01          | Medicine
02     | 02          | Sport

Select from this on cat_id = main_cat_id to find main categories; join back onto itself on left.cat_id = right.main_cat_id to find the child categories, then onto posts on cat_id = cat_id. Group by left.cat_id and project over cat_id and count(*).

I tried this in PostgreSQL 8.4, and i don't see why this wouldn't work in MySQL, as the query is pretty basic. My tables:

create table categories(
  cat_id varchar(40) primary key,
  main_cat_id varchar(40) not null references categories,
  title varchar(40) not null
)

create table posts (
  post_id integer primary key,
  cat_id varchar(40) not null references categories,
  title varchar(40) not null
)

My query (grouping by title rather than ID):

select m.title, count(*)
from categories m, categories c, posts p
where m.cat_id = c.main_cat_id
  and c.cat_id = p.cat_id
group by m.title

UPDATE: I also had a shot at making this work with a string operation, as the OP tried. The query (in standard-compliant SQL as accepted by PostgreSQL, rather than MySQL's dialect) is:

select m.title, count(*)
from categories m, posts p
where m.cat_id = substring(p.cat_id from 1 for 2)
group by m.title;

Which works fine. I can't offer a meaningful comparison as to speed, but the query plan for this did look a bit simpler than that for the two-way join.

Tom Anderson 2010-09-04 14:11:32

This is a variation of the adjacency list approach I explained on my answer, which gets really cumbersome to work with when you have an more than one level of (sub)categories.

NullUserException 2010-09-04 14:26:01

@NullUserException: to an extent. If your database can do recursive queries (Oracle, PostgreSQL, Firebird and SQL Server all do - not sure about MySQL), then it gets slightly tricky rather than really cumbersome, and remains much easier (and, AIUI, faster) than nested sets. But since the OP has shown no interest in multiple levels of subcategories, that seems moot.

Tom Anderson 2010-09-04 16:18:59

ansaurus

tags:

views:

answers:

Counting records belonging to a category in MySQL

related questions