views:

53

answers:

1

Hoping someone can provide some mysql advice...

I have 2 tables that look like this:

searchTagsTable
    ID
    tag

dataTable
    ID
    title
    desc
    tagID

So the column "tagID" in "dataTable" is a comma-delimmited string of ids pointing to searchTagsTable.

I'd like to use mysql's built in fulltext search capabilities to search title, description, and tags.

I'm wondering: What is considered the "best" solution in a situation like this?

Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?

Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?

Thanks in advance for your help.

Travis

A: 

Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?

That wouldn't be possible. Indexes can only span columns of a single table.

Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?

That would certainly be the simplest solution. You are currently not really getting any benefit from giving tags their own identity, since you can't use foreign keys and indexing on them.

However, MySQL's FULLTEXT indexing is not ideal for a tag system:

  • by default, it won't index words shorter than four letters;
  • by default, it has many (many) stopwords it won't index that you might want to use for tags;
  • it'll be less efficient than a normal index;
  • it only works in MyISAM, which is in all other respects a much worse database engine than InnoDB. Except where you really have to, you shouldn't really be using MyISAM today.

You can fix the minimum word length and stopwords by altering the MySQL configuration. This will make your indexes much bigger though. This may be an acceptable solution if you control the database everywhere your app will be deployed, and if you are only using tags as ‘extra words’ in a fulltext search-fodder, rather than a full categorisation system.

Otherwise... comma-delimited anything in a database is suspect IMO. It's usually better to use a one-to-many join table to express the idea that one entity has many tags. Then you can use a simple index to aid lookups instead of the limited FULLTEXT indexing scheme, which will be faster, more reliable, and allows you to use InnoDB and foreign keys. eg.:

dataTable
    ID       (primary key)
    title
    desc

dataTags
    ID       (foreign key -> dataTable)
    tagName  (index this column)

(You could still have the tagID->tagName mapping as well on top of this if you want the tags to have independent identity. I'm not sure if it's doing anything useful in your case though.)

If you need to get a comma-separated list from a one-to-many relation like the above, you can do it using the MySQL-specific GROUP_CONCAT function.

SELECT dataTable.*, GROUP_CONCAT(dataTags.tagName)
FROM dataTable
JOIN dataTags ON dataTags.ID=dataTable.ID
GROUP BY dataTable.ID;

That leaves the fulltext indexing of the title and desc. Which unfortunately does need you to put them in a MyISAM table.

A common alternative to this which you might also consider would be to keep the ‘canonical’ copies in the main table (potentially in an ACID-safe InnoDB table), and store a separate copy of all the title, desc and tags together in a FULLTEXT-indexed MyISAM table purely for fulltext search bait. This does mean you have to do an extra update each time you change the primary data (though if you fail or have to rollback a transaction, at least it's only relatively-unimportant search bait that's now wrong), but the advantage is you can apply extra processing to it, such as stemming and punctuation handling, which MySQL's FULLTEXT indexer doesn't do itself.

bobince
Thanks for the detailed reply, it's very helpful. Regarding this: "It's usually better to use a one-to-many join table to express the idea that one entity has many tags." Please bear with me. I want to be sure I understand the relationship you are suggesting between the tables dataTable and dataTags. Let's say I have two distinct rows in dataTable (ID 1, ID 2) that I would like to associate with a specific tag "mysql". Could you explain what the data would look like in the table dataTags? Thanks again for your help, it's much appreciated. Travis
Travis
Yes, you'd have `dataTags (ID, tagName)` values `(1, 'mysql')` and `(2, 'mysql')`. The `ID` is only a foreign key back to `dataTable` not a primary key in itself. The primary key would be over both ID and tagName.
bobince
Wonderful. Thanks for your help. -Travis
Travis