Here are a few "rules of the game" that you must keep in mind for solving this problem. You probably know these already, but stating them clearly may help confirm for other readers.
- All indexes in MySQL can reference only columns in a single base table. You can't make a fulltext index that indexes across multiple tables.
- You can't define indexes for views, only base tables.
- A
MATCH()
query against a fulltext index must match against all the columns in the fulltext index, in the order declared in the index.
I would create a third table to store the content you want to index. No need to store this content redundantly -- store it solely in the third table. This borrows a concept of a "common superclass" from object-oriented design (insofar as we can apply it to RDBMS design).
CREATE TABLE Searchable (
`id` SERIAL PRIMARY KEY,
`title` varchar(100) default NULL,
`description` text,
`keywords` text,
`url` varchar(255) default '',
FULLTEXT KEY `TitleDescFullText` (`keywords`,`title`,`description`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `shopitems` (
`id` INT UNSIGNED NOT NULL,
`ShopID` INT UNSIGNED NOT NULL,
`ImageID` INT UNSIGNED NOT NULL,
`pricing` varchar(45) NOT NULL,
`datetime_created` datetime NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `shops` (
`id` INT UNSIGNED NOT NULL,
`owner_id` varchar(255) default NULL,
`datetime_created` datetime default NULL,
`created_by` varchar(255) default NULL,
`datetime_modified` datetime default NULL,
`modified_by` varchar(255) default NULL,
`overall_rating_avg` decimal(4,2) default '0.00',
PRIMARY KEY (`id`),
FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Notice the only table with an auto-increment key is now Searchable
. The tables shops
and shopitems
use a key with a compatible data type, but not auto-increment. So you must create a row in Searchable
to generate the id
value, before you can create the corresponding row in either shops
or shopitems
.
I've added FOREIGN KEY
declarations for illustration purposes, even though MyISAM will silently ignore these constraints (and you already know that you must use MyISAM to have support for fulltext indexing).
Now you can search the textual content of both shops
and shopitems
in a single query, using a single fulltext index:
SELECT S.*, sh.*, si.*,
MATCH(keywords, title, description, url) AGAINST('dummy') As score
FROM Searchable S
LEFT OUTER JOIN shops sh ON (S.id = sh.id)
LEFT OUTER JOIN shopitems si ON (S.id = si.id)
WHERE MATCH(keywords, title, description, url) AGAINST('dummy')
ORDER BY score DESC;
Of course, for a given row in Searchable
only one table should match, either shops or shopitems, and these tables have different columns. So either sh.*
or si.*
will be NULL in the result. It's up to you to format the output in your application.
A couple of other answers have suggested using Sphinx Search. This is another technology that complements MySQL and adds more sophisticated full-text search capability. It has great performance for queries, so some people have gotten pretty enchanted with it.
But creating indexes and especially adding to an index incrementally is expensive. In fact, updating a Sphinx Search index is so costly that the recommended solution is to create one index for older, archived data, and another smaller index for recent data that is more likely to be updated. Then every search has to run two queries, against the two separate indexes. And if your data doesn't naturally lend itself to the pattern of older data being unchanging, then you may not be able to take advantage of this trick anyway.
Re your comment: Here's an excerpt from the Sphinx Search documentation about live updates to an index:
There's a frequent situation when the
total dataset is too big to be
reindexed from scratch often, but the
amount of new records is rather small.
Example: a forum with a 1,000,000
archived posts, but only 1,000 new
posts per day.
In this case, "live" (almost real
time) index updates could be
implemented using so called
"main+delta" scheme.
The idea is that since it's costly to update a Sphinx Search index, their solution is to make the index you update as small as possible. So that only the most recent forum posts (in their example), whereas the larger history of archived forum posts never changes, so you build a second, larger index for that collection once. Of course if you want to do a search, you have to query both indexes.
Periodically, say once a week, the "recent" forum messages would become considered "archived" and you'd have to merge the current index for recent posts to the archived index, and start the smaller index over. They do make the point that merging two Sphinx Search indexes is more efficient than reindexing after an update to the data.
But my point is that not every data set naturally falls into the pattern of having an archived set of data that never changes, versus recent data that updates frequently.
Take your database for example: You have shops and shopitems. How can you separate these into rows that never change, versus new rows? Any shops or products in the catalog should be permitted to update their description. But since that'd require rebuilding the entire Sphinx Search index every time you make a change, it becomes a very expensive operation. Perhaps you'd queue up changes and apply them in a batch, rebuilding the index once a week. But try explaining to the shop vendors why a minor change to their shop description won't take effect until Sunday night.