views:

66

answers:

3

Maybe the solution is obvious, but I cant seem to find a good one.

In my upcoming project, there will be one main table, its data will be read frequently. Update / Insert / Delete speed is not an issue.

The items in that main table are associated to 4 or more categories. An item can have 50 - 100 or more relations within one category.

The most common operations that will be performed on the database:

  • select all items that have been assigned to category A, B, C, ... with LIMIT X, Y
  • count all items that have been assignged to category A, B, C, ...


My first thought on how to create a database for the above was something like this (classic approach I guess):

First, for each of the four categories I create a category table:

id   - PK, int(11), index   
name - varchar(100)

then I will have one item table:

id   - PK, int(11), index
... some more data fields, about 30 or so ...

and to relate the category tables, there will be 4 or more lookup / MM tables like so:

id_item     - int(11)
id_category - int(11)

The queries looked something like this:

select
item.*

from
item

inner mm_1 on mm_1.id_item = item.id
inner join cat_1 on cat_1.id = mm_1.id_category and cat_1.id in (1, 2, ... , 100)

inner mm_2 on mm_2.id_item = item.id
inner join cat_2 on cat_2.id = mm_2.id_category and cat_2.id in (50, 51, ... , 90)

Of course the above approach with MM tables would work, but as the app should provide very good SELECT performance, I tested it with real world amounts of data (100.000 records in the item table, 50 - 80 relations in each category), but it was not as fast as I expected, even with indexes in place. I also tried using WHERE EXISTS instead of INNER JOIN when selecting.


My second idea was to just use the item table from above denormalize the data.

After reading this blog post about using bitmasks I gave it a try and assigned each category a bit value:

category 1.1 - 1
category 1.2 - 2
category 1.3 - 4
category 1.4 - 8
... etc ...

So, if an item was tagged with category 1.1 and category 1.3, it had a bitmask of 5, which I then stored in a field item.bitmask and I can query it like so:

select count(*) from item where item.bitmask & 5 = 5

But performance was not so great either.

The problems with this bitmasking approach: mysql does NOT use any indexes when bit operators are involved and even when item.bitmask would be of type BIGINT I can only handle up to 64 relations, but I need to support up to 100 per category.


That was about it. I cant think of anything more except maybe polluting the item table with many, many fields like category_1_1 up to category_4_100 each of the contains either 1 or 0. But that could lead to many AND in the WHERE clause of the select and that does not seem like a good idea, too.

So, what are my options? Any better ideas out there?


EDIT: as an response to Cory Petosky comment "What does "An item can have 50 - 100 or more relations within one category." mean?":

To make it more concrete, the item table represents an image. Images are among other criterias categorized in moods (mood would be one of 4 categories). So it would look like this:

Image:
     - Category "mood":
         - bright
         - happy
         - funny
         - ... 50 or so more ...
     - Category "XYZ":
         - ... 70 or so more ...

If my image table would be a class in C#, it would look like this:

public class Image {
 public List<Mood> Moods; // can contain 0 - 100 items
 public List<Some> SomeCategory; // can contain 0 - 100 items
 // ...
}
+1  A: 

What about this (pseudocode):

Item (image)
    Id         PK, int(11)
    Name       varchar(100)

Category (mood, xyz)
    Id         PK, int(11)
    Name       varchar(100)

Relations (happy, funny)
    Id         PK, int(11)
    Name       varchar(100)

ItemCategories
    Id         PK, int(11)
    ItemId     FK, int(11)
    CategoryId FK, int(11)

ItemCategoryRelations
    ItemCategoriesId FK, int(11)
    RelationId       FK, int(11)

SELECT *
  FROM Item 
  JOIN ItemCategories ON Item.Id = ItemCategories.ItemId
 WHERE ItemCategories.CategoryId IN (1, 2, ..., 10)

Below version uses one less table but doesn't supports categories without relations, and relations can't be reused. So, its just valid if matches your data structure requirements:

Item (image)
    Id         PK, int(11)
    Name       varchar(100)

Category (mood, xyz)
    Id         PK, int(11)
    Name       varchar(100)

Relations (happy, funny)
    Id         PK, int(11)
    CategoryId FK, int(11)
    Name       varchar(100)

ItemRelations 
    ItemId     FK, int(11)
    RelationId FK, int(11)

SELECT *
  FROM Item 
  JOIN ItemRelations ON Item.Id = ItemRelations.ItemId
  JOIN Relations ON Relations.Id = ItemRelations.RelationsId
 WHERE Relations.CategoryId IN (1, 2, ..., 10)
Rubens Farias
Thanks for your reply, but I dont get it. How would a sql query look like? How would the table definitions look like?
Max
@max, edited, please take a look
Rubens Farias
A: 

So if I understand right, an image falls into one of four of your main categories...mood for example. Then within mood it can be linked to 'bright' and 'happy.' and so on.

While I absolutely love bitmasking (microprocessor programmer here by day), and while I always seem to love applying it to db design as well, there always seems to be a better way.

How about something like this.

tblItems 
------------------
  item_id
  item_name

tblCategories
------------------
  category_id
  category_name

tblRelations
------------------
  relation_id
  relation_name

tblCategoryRelationLink (link relations to specific categories)
------------------
  cat_rel_id
  category_id
  relation_id

tblItemRelationLink (set relations to items)
------------------
  item_rel_id
  item_id
  rel_id

If your relations are specific to categories....then you can simply lookup which category a specific relation is linked to. If somehow you can have a relation linked to two categories, then you would need an extra table as well (to link an item to a category).

espais
sorry, posted before i read ruben's answer...i think both of our implementations are fairly similar....either way bitmasking might not be worth it for this case!
espais
great minds think alike =)
Rubens Farias
A: 

How about this one; each category can have parent category. In your example, if bright is a child of mood then linking an item to bright would automatically make it mood\bright. alt text

Damir Sudarevic