views:

501

answers:

1

I have authenticated users in my application who have access to a shared database of up to 500,000 items. Each of the users has their own public facing web site and needs the ability to prioritize the items on display (think upvote) on their own site.

out of the 500,000 items they may only have up to 200 prioritized items, the order of the rest of the items is of less importance.

Each of the users will prioritize the items differently.

I initially asked a similar mysql question here http://stackoverflow.com/questions/1281484/mysql-results-sorted-by-list-which-is-unique-for-each-user and got a good answer but i believe a better option may be to opt for a non sql indexed solution.

Can this be done in Lucene?, is there another search technology which would be better for this.

ps. Google implements a similar type setup with their search results where you can prioritize and exclude your own search results if you are logged in.

Update: re-tagged with sphinx as i have been reading the documentation and i believe it may be able to do what i am looking for with "per-document attribute values" stored in memory - interested to hear any feedback on this from sphinx gurus

+1  A: 

You'll definitely want to store the id of item in each document object when building your index. There's a few ways to do the next step, but an easy one would be take the prioritized items and add them to your search query, something like this for each special item:

"OR item_id=%d+X"

where X is the amount of boost you'd like to use. You'll probably need to empirically tweak this number to make sure that just being "upvoted" doesn't put it to the top of a list searching for something totally unrelated.

Doing it this way will at least prevent you from a lot of annoying postprocessing steps that would require you to iterate over the whole result set -- hopefully the proper sorting will be there right from querying the index.

Robert Elwell
ok so im guessing i would store the prioritized list in mysql or similar, and select this list by user_id ordered by prority. with this list i would then form the lucene search query string as you have suggested. will this still scale and work fast if there is say 200 items in their prority list
ADAM