views:

80

answers:

2

What options exist for creating a scalable, full text search with results that need to be sorted on a per user basis? This is for PHP/MySQL (Symfony/Doctrine as well, if relevant).

In our case, we have a database of workouts that have been performed by users. The workouts that the user has done before should appear at the top of the results. The more frequently they've done the workout, the higher it should appear in search matches. If it helps, you can assume we know the number of times a user has done a workout in advance.

Possible Solutions

Sphinx - Use Sphinx to implement full text search, do all the querying and sorting in MySQL. This seems promising (and there's a Symfony Plugin!) but I don't know much about it.

Lucene - Use Lucene to perform full text search and put the users' completions into the query. As is suggested in this Stack Overflow thread. Alternatively, use Lucene to retrieve the results, then reorder them in PHP. However, both solutions seem clunky and potentially unscalable as a user may have completed hundreds of workouts.

Mysql - No native full text support (InnoDB), so we'd have use LIKE or REGEX, which isn't scalable.

+1  A: 

MySQL does have a native FULLTEXT support, though only in MyISAM tables.

For most real-world tasks, Sphinx is the fastest engine. However, it is an external index, so it can only be updated on a timely basis with a cron script.

By using SphinxSE (a pluggable MySQL interface to Sphinx), you can join MySQL tables and Sphinx indexes in one query. Updating, though, will still require an external script.

Since the number of workouts performed seems to change frequently, keeping it in Sphinx would require too much effort on rebuilding the index.

With SphinxSE, you can write a query similar to that:

SELECT  *
FROM    workouts w
JOIN    user_workouts uw
ON      uw.workout = w.id
WHERE   w.query = 'query query query;filter=user_id,$user_id'
        AND uw.user = $user_id
ORDER BY
        uw.times_performed DESC
Quassnoi
A: 

I'm not sure why you're assuming using Lucene would be unscalable. Hundreds of workouts per user is not a lot of data to deal with.

Try using Solr/Lucene for the search backend. It has a JSON/XML interface which will play nicely with your PHP frontend. Store a user's completed workout # in a database table. When a query is issued, take the results from Solr, and you can select from the database table and resort in PHP code. Should be plenty fast and scalable. With Solr, maintaining your index is dirt simple; just issue add/update/delete requests to your Solr server.

bajafresh4life
The user has only completed a few hundred, but in total we need to support hundreds of thousands of workouts. Are you suggesting we use Solr/Lucene to search the millions of workouts and then reorder the results? Some searches can return a large portion of the records.
jeremy