views:

447

answers:

3

First off all I know:

Premature optimization is the root of all evil

But I think wrong autocomplete can really blow up your site.

I would to know if there are any libraries out there which can do autocomplete efficiently(serverside) which preferable can fit into RAM(for best performance). So no browserside javascript autocomplete(yui/jquery/dojo). I think there are enough topic about this on stackoverflow. But I could not find a good thread about this on stackoverflow (maybe did not look good enough).

For example autocomplete names:

names:[alfred, miathe, .., ..]

What I can think off:

  • simple SQL like for example: SELECT name FROM users WHERE name LIKE al%.
    • I think this implementation will blow up with a lot of simultaneously users or large data set, but maybe I am wrong so numbers(which could be handled) would be cool.
  • Using something like solr terms like for example: http://localhost:8983/solr/terms?terms.fl=name&terms.sort=index&terms.prefix=al&wt=json&omitHeader=true.
    • I don't know the performance of this so users with big sites please tell me.
  • Maybe something like in memory redis trie which I also haven't tested performance on.
  • I also read in this thread about how to implement this in java (lucene and some library created by shilad)

What I would like to hear is implementation used by sites and numbers of how well it can handle load preferable with:

  • Link to implementation or code.
  • numbers to which you know it can scale.
  • It would be nice if it could be accesed by http or sockets.

Many thanks,
Alfred

+4  A: 

Optimising for Auto-complete

Unfortunately, the resolution of this issue will depend heavily on the data you are hoping to query.

LIKE queries will not put too much strain on your database, as long as you spend time using 'EXPLAIN' or the profiler to show you how the query optimiser plans to perform your query.

Some basics to keep in mind:

  • Indexes: Ensure that you have indexes setup. (Yes, in many cases LIKE does use the indexes. There is an excellent article on the topic at myitforum. SQL Performance - Indexes and the LIKE clause ).

  • Joins: Ensure your JOINs are in place and are optimized by the query planner. SQL Server Profiler can help with this. Look out for full index or full table scans

Auto-complete sub-sets

Auto-complete queries are a special case, in that they usually works as ever decreasing sub sets.

  • 'name' LIKE 'a%' (may return 10000 records)
  • 'name' LIKE 'al%' (may return 500 records)
  • 'name' LIKE 'ala%' (may return 75 records)
  • 'name' LIKE 'alan%' (may return 20 records)

If you return the entire resultset for query 1 then there is no need to hit the database again for the following result sets as they are a sub set of your original query.

Depending on your data, this may open a further opportunity for optimisation.

Jon Winstanley
Thanks for your answer :)
Alfred
Right now I think this the best answer for me. But I hope other people will tell there experience.
Alfred
+5  A: 

I will no comply with your requirements and obviously the numbers of scale will depend on hardware, size of the DB, architecture of the app, and several other items. You must test it yourself.

But I will tell you the method I've used with success:

  • Use a simple SQL like for example: SELECT name FROM users WHERE name LIKE al%. but use TOP 100 to limit the number of results.
  • Cache the results and maintain a list of terms that are cached
  • When a new request comes in, first check in the list if you have the term (or part of the term cached).
  • Keep in mind that your cached results are limited, some you may need to do a SQL query if the term remains valid at the end of the result (I mean valid if the latest result match with the term.

Hope it helps.

Eduardo Molteni
Thanks it helps :-)
Alfred
some other random advice, not worthy a full fledged answer: index the search column. start autocompletion at the second or third character. beware of flushing the cache when dataset changes, or update it somehow. if the output list is big enough, consider compressing it. if you aim for a really big site, consider a caching reverse proxy for this kind of semi static stuff (and remember to set the response header regarding cache timeout to something sane)
Lorenzo Boccaccia
+1  A: 

Using SQL versus Solr's terms component is really not a comparison. At their core they solve the problem the same way by making an index and then making simple calls to it.

What i would want to know is "what you are trying to auto complete".

Ultimately, the easiest and most surefire way to scale a system is to make a simple solution and then just scale the system by replicating data. Trying to cache calls or predict results just make things complicated, and don't get to the root of the problem (ie you can only take them so far, like if each request missed the cache).

Perhaps a little more info about how your data is structured and how you want to see it extracted would be helpful.

mlathe
Say for example autocomplete name:[alfred,miathe,..,..]. What is your favorite solution.
Alfred
if you have only a few hundred records, why not just do it on the client side.. just load it all, and use JS. If you are talking about millions of records then using something like a LIKE on a DB is probably the easiest way to get something working that will scale. But you are getting a lot of functionality in the DB that you aren't using.
mlathe