views:

405

answers:

2

I'm currently trying to find the best way (in term of usability and performance) when dealing with a situation like fetching records tagged with a specific tag, or category, or something like that.

A good way (the way I wanted to do), would be to fetch records with the tag/category slug, so the URL would look like :

http://stackoverflow.com/questions/tagged/language-agnostic

fetching records by slug, which looks better than :

http://stackoverflow.com/questions/tag/789/language-agnostic

fetching by ID and adding the slug behind so it's more search-engine friendly. This one is better performance-wise, because fetching data by an integer ID would be faster than a string. (cmiiw)

Now, with a db schema like :

posts    post_to_tags    tags
-----    ------------    ----
id       id              id
title    post_id         name
content  tag_id          slug
...                      ...

am I doing it right ? Is there pitfall or best-practices that I need to know to avoid performance problems ? (eg. tags should not exceed 10,000 records, or tag slug should not exceed n characters, or something else)

Thanks in advance.

+3  A: 

With the first URL style and your current db design, you can do this:

select ...
from   posts p
join   posts_to_tags pt on pt.post_id = p.post_id
join   tags t on t.id = pt.tag_id
where  t.slug = [url slug value];

As long as tags.slug is indexed, this should be very efficient, hardly any different from

select ...
from   posts p
join   posts_to_tags pt on pt.post_id = p.post_id
where  pt.tag_id = [url tag ID];
Tony Andrews
in the case where there's 10,000 but indexed varchar(30) tags vs the integer one, how much, approximately, is the difference we are talking about ?
andyk
Well, we're just talking about one indexed read of the tags table using the given slug value to get the tag_id, so it's peanuts. On Oracle I just experimented 10,000 selects either way and got timings of 1.078 and 1.312 seconds - i.e. 0.0234 milliseconds per query!
Tony Andrews
I mean, i.e. a *difference of* 0.0234 milliseconds per query.
Tony Andrews
wow, I really appreciate that. Thanks a lot Tony ! Nice age you have there btw.
andyk
Note too Andy, that 2 articles *may* (depending on your application) have the same slug... in this case an ID followed by the slug is better.
alex
yup, that's the extra hassle. I handle the checking and renaming stuff from the CMS, whenever new entry is inserted, following Wordpress' example, eg. some-slug-url into some-slug-url-2. Not so beautiful IMO, that's why ppl should avoid using this method when .. damn 300 characters limit..
andyk
.. that's why people should avoid using this method when the site is heavily-packed with entries, especially when the entries are user-generated. Thanks Alex for the insight.
andyk
A: 

The first one is better, but can the slugs possibly be changed? In that case you'd need to have a redirect table (e.g. "some-article-about-dogs" is now "article-about-dogs-and-cats").

Bart van Heukelom