views:

135

answers:

1

AJAX autocomplete is fairly simple to implement. However, I wonder how to handle smart tag suggestion like this on SO.

To clarify the difference between autocomplete and suggestion:

  • autocomplete: foo [foobar, foobaz]
  • suggestion: foo [barfoo, foobar, foobaz], or even better, with 'did you mean' feature: [barfoo, foobar, foobaz, fobar, fobaz]

I suppose I need some full text search in tags (all letters indexed, not just words). There would be no problem to do it witch regex or other patterns for limited number of tags (even client side).

But how to implement this feature for big number of tags?
Is there any particular reason (besides URL) the tags on SO are dash separated? What about Unicode characters in tags?

I store the tags in the table with the following columns: id, tagname. My SQL query returns objects with following fields: id, tagname, count

(I use Doctrine ORM and pgsql as default db driver.)

+1  A: 

I would go with SELECTING them from database by REGEXP at every keypress. I did this on my sites and the was no prefrormance problem (I do not have heavy loaded server thought). If you do not like this idea, I would cash all 1-5 letters combinations which will users enter and refresh them on daily basis in separate table. If this table is indexed than you have very fast implementation.

To elaborate more on the second appreach:

Briefly: 1. Make a table SEARCHTABLE representing 1-n relationship betwean keywords (limit it to 3-4 letters) and primary IDs of tags. 2. INDEX on both fields. 3. Everytime the user makes a search do look at the SEARCHTABLE and if the combination is there, use that - very fast, as everything is indexed. If not do the regexp search and put all results to SEARCHTABLE.

Notes:

  1. You should invalidate the table if you add tags, but this should much less often than a search. When invalidating table you do not necesarilly TRUNCATE it, you can easily rebuild it taking all keywords into account.
  2. If you want to speed it up, you can "pregenerate" all two or even three letters searches.
  3. If you care enough, you should be using information from n-1 letter kewords to generate the n letter keyword. It speeds the things tremendously. Imagine that user has typed "mo" and you have shown them appropriate result from SEARCHTABLE. Than when she types "n" giving it "mon" you need only serach trough already selected items to generate new response.

Hope it is more comprehensive now.

gorn
Could you elaborate the approach without regexes?
takeshin
I have elaborated the answer a bit.
gorn
Thank you for the explanation.
takeshin