tags:

views:

297

answers:

5
+1  Q: 

PHP Tag Cloud

I am looking for help with the database scheme, not the actual "cloud" itself.

In a site where users submit images and can tag images, how should the database be setup for optimal performance?

I was thinking

ID - int(11), unique, auto_incremenet
tag - varchar(20)
imageID - int(11)

so suppose i upload an image, and tag it "toronto, sushi, summer".

query would be:

INSERT INTO tags (tag, imageID) VALUES ('$tag[0]', $imageID);
INSERT INTO tags (tag, imageID) VALUES ('$tag[1]', $imageID);
INSERT INTO tags (tag, imageID) VALUES ('$tag[2]', $imageID);

Then to retrieve, i'd select * from tags where imageID = $imagID.

Is there a flaw with this?

+3  A: 

You should have a HABTM(has and belongs to many) relationship between two tables one for the images, one for the tags, and a third table with combinations of image ids and tag ids. This way you do not limit the number of tags that a image can have or number of images a tag can belong to and you do not have duplication.

mattphp
I ended up doing this. Mainly because i feel this is the way SoF does it as well, which could allow me to add extra columns to that table, such as which user created that tag first, etc.
lyrae
+3  A: 

I don't see any real problems with this approach other than images that share the same tag have duplicate entries in the database. If you try to normalize though, you end up with a table that contains duplicate references to another table that holds the tags themselves, which in this case seems like a waste of time (coding, joining and traversing tables for MySQL).

One tiny optimization I'd consider though is the order of your columns. Group the 'int's together, as they are fixed width for MySQL meaning they can be searched marginally faster in that order than int varchar int.

Gav
+1  A: 

I would use a separate tag table: TABLE tags: tag_id- int(11), unique, auto_incremenet tag - varchar(20)

TABLE image tags:
ID - int(11), unique, auto_incremenet
tag - varchar(20)
imageID - int(11)

Then I would look up if the tag is already there and will insert only the IDs

INSERT INTO tags (tag, imageID) VALUES ('$tag_id[0]', $imageID); INSERT INTO tags (tag, imageID) VALUES ('$tag_id[1]', $imageID); INSERT INTO tags (tag, imageID) VALUES ('$tag_id[2]', $imageID);

In this way images with same tags will be easier to find as they share same tag_id and not only the same varchar content. Of course you should transorm the tags into lowercase and replace special chars, etc.

Peter Parker
+1  A: 

Make sure there is an index on the imageID field.

acrosman
+2  A: 

Would changing the tag field to a char(20) also increase performance? The whole table would become fixed-width and queries run on fixed-width tables are generally quicker - so I am led to believe in my recent study of DB design.

Being fixed to 20 characters will cause a little overhead in terms of the amount of space the table takes up, but it is such a small table anyway I can't see a slightly larger file size being a huge issue.

Having said that, for the very fact is a tiny table I imagine you would need A LOT of rows of data before you saw a difference between varchar(20) and char(20).

Just a thought. :)

Peter Spain