views:

36

answers:

1

I am trying to figure out if how I can accomplish the following and none of the answers I have found so far seem to fit:

I have a fairly static and large set of resources I need to have indexed and searchable. Solr seems to be a perfect fit for that. In addition I need to have the ability for my users to add resources from the main data set to a 'Favourites' folder (which can include a few more tags added by them). The Favourites needs to be searchable in the same manner as the main data set, across all the same fields plus the additional ones.

My first thought was to have two separate schemas - the first for the main data set and its metadata - the second for the Favourites folder with all of the metadata from the main set copied over and then adding the additional fields.

Then I thought that would probably waste quite a bit of space (the number of users is much larger than the number of main resources).

So then I thought I could have the main data set with its metadata (Core0), same as above with the resourceId as the unique identifier. Then there would be second one (Core1) for the Favourites folder with the unique id of the resourceId, userId, grade, folder all concantenated. The resourceId would be a separate field also. In addition, I would create another schema/core (Core3) with all the fields from the other two and have a request handler defined on it that searches across the other 2 cores and returns the results through this core.

This third core would have searches run against it where the results would expect to only be returned for a single user. For example, a user searches their Favourites folder for all the items with Foo. The result is only those items the user has added to their Favourites with Foo somewhere in their main data set metadata. I guess the result handler from Core3 would break the search up into a search for all documents with Foo in Core0, a search across Core1 for userId and folder and then match up the resourceIds from both of them and eliminate those not in both. Or run a search on Core1 with the userId and folder and then having gotten that result set back, extract all the resourceIds and append an AND onto the search query to Core0 like: AND (resourceId = 1232232312 OR resourceId = 838388383 OR resourceId = 8637626491).

Could this be made to work? Or is there some simpler mechanism is Solr to resolve the merging of 2 searches across 2 cores and only return the results that match on (not necessarily a unique) field in both?

Thanks.

A: 

Problem looks like a data base join of 2 tables with resource id as the foreign key. Ignore the post if what i understood is wrong.

First i will probably do it with a single core, with a field userid (indexed, but not stored), reindex a document every time a new user favorites it by appending his user id (delimited by something that analyzer ignores). So searching gets easier (userId:"kaka's id" will fetch all my favorites) I think it takes some work to do this and also if number of users who can like a document increases, userid field gets really long.

So in that case,i will move on to my next idea which is similar to yours,have a second core with (userid,resource id).Write a wrapper which first searches this core for all the favorites, then searches another core for all the resources in a where condition, but again..if a user favorites more resources, the query might exceed GET method's size limit..

If both doesn't seem to work, its time to think something more scalable, which leaves us the same space wasting option.

Am i missing something??

kaka