tags:

views:

35

answers:

1

I have three tables, all of which have a column with a fulltext index. The user will enter search terms into a single text box, and then all three tables will be searched.

This is better explained with an example:

documents
  doc_id
  name FULLTEXT

table2
  id
  doc_id
  a_field FULLTEXT

table3
  id
  doc_id
  another_field FULLTEXT

(I realise this looks stupid but that's because I've removed all the other fields and tables to simplify it).

So basically I want to do a fulltext search on name, a_field and another_field, and then show the results as a list of documents, preferably with what caused that document to be found, e.g. if another_field matched, I would display what another_field is.

I began working on a system whereby three fulltext search queries are performed and the results inserted into a table with a structure like:

search_results
  table_name
  row_id
  score

(This could later be made to cache results for a few days with e.g. a hash of the search terms).

This idea has two problems. The first is that the same document can be in the search results up to three times with different scores. Instead of that, if the search term is matched in two tables, it should have one result, but a higher score.

The second is that parsing the results is difficult. I want to display a list of documents, but I don't immediately know the doc_id without a join of some kind; however the table to join to is dependant on the table_name column, and I'm not sure how to accomplish that.

Wanting to search multiple related tables like this must be a common thing, so I guess what I'm asking is am I approaching this in the right way? Can someone tell me the best way of doing it please.

A: 

I would create a denormalized single index. Ie, put all three document types into a single table with fields for doc_id, doc_type and a single fulltext block. Then you can search all three document types at once.

You might also find that Lucene would make sense in this situation. It gives you faster searching, as well as much more functionality around how the searching and scoring works.

The downside is that you're keeping a separate denomalized copy of the text for each record. The upside is that searching is much faster.

Chase Seibert
I don't think that would be appropriate. For example, table2 is a tree structure with every entry having a parent id, and each document will have one of these trees with as many as 20-30 items in it. I can't really put that data in the documents table. But I want it to be searched and if a match is found, the corresponding document displayed in the results.
Rob
An index isn't your database table. It's a separate denormalized copy just for fast searching.
Chase Seibert