views:

79

answers:

2

Hi, all.

I am trying to design a tagging system with a model like this:

Tag:
   content = CharField
   creator = ForeignKey
   used = IntergerField

It is a many-to-many relationship between tags and what's been tagged.

Everytime I insert a record into the assotication table, Tag.used is incremented by one, and decremented by one in case of deletion.

Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'.

However, this seems to slow insertion down obviously.

Please tell me how to improve this design.

Thanks in advance.

+1  A: 

I don't think it's a good idea to denormalize your data like that.

I think a more elegant solution is to use django aggregation to track how many times the tag has been used http://docs.djangoproject.com/en/dev/topics/db/aggregation/

You could attach the used count to your tag object by calling something like this:

my_tag = Tag.objects.annotate(used=Count('post'))[0]

and then accessing it like this:

my_tag.used

assuming that you have a Post model class that has a ManyToMany field to your Tag class

You can order the Tags by the named annotated field if needed:

Tag.objects.annotate(used=Count('post')).order_by('-used')
digitaldreamer
I think this is slow if I need to sort tags by how many times they are used, because there is a "select count" query for every tag. Or what's wrong with my understanding?
Satoru.Logic
Do you just think it would be slow, or are you actually seeing poor performance? In my experience the Django ORM optimizes my queries much better than I could write by hand (I'm not a DBA). Aggregation returns a query set, so ordering the tags by the used count is as simple as: Tag.objects.annotate(used=Count('post')).order_by('-used') . Because of the lazy queries, this will build, sort, filter, and order your query through the SQL call, which should be fast enough for small to medium sized sites.
digitaldreamer
Thanks, digitaldreamer. Yes, I *just* think it would be slow and trying to design something kind of 'trade space for time', but I have no proof that it is better done this way, and that's why I'm here :)
Satoru.Logic
No, there isn't a select count for every tag, this is done by group by so is very efficient. You can see this if you add `.query.as_sql()` to the end of the ORM call.
Daniel Roseman
I'll try it out, thanks. Guess I'd better make my table align with the normalized way :)
Satoru.Logic
+1  A: 

If your database support materialized indexed views then you might want to create one for this. You can get a large performance boost for frequently run queries that aggregate data, which I think you have here.

your view would be on a query like:

SELECT
    TagID,COUNT(*)
    FROM YourTable
    GROUP BY TagID

The aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.

KM