Using solr for indexing different types of data

Both are legitimate approaches, but there are tradeoffs. First, how big is your dataset? If it is large enough that you may want to partition it across multiple servers, it probably makes sense to have different indexes.

Second, how important is performance - indexing it all together will likely result in worse performance, but the degree depends on how much data there is and how complex the queries can get.

Third, do you have the need to query for multiple data types in the same search? If so, indexing everything together can be a convenient way to allow this. Technically this could be achieved with separate indexes, but getting the most relevant results for the query could be a challenge (not that it isn't already)

Fourth, a single index with a single schema and configuration can simplify the life of whoever will be deploying and maintaining the system.

One other thing to consider is IDs - do the all of the different objects have a unique identifier across all types? If not, you probably will need to generate this if you want to index them together.

Thanks for your answer. I guess, I really have to stick with multiple indexes since the generation of unique identifiers in one index would be a mess in my case. I played around with solr index distribution and using shards, but they apparently were made for speeding up queries on huge datasets. I think five or even more cores isn't the way of use it is supposed to be. So my current thoughts are going towards just using Lucene without solr.

Markus Lux 2009-06-16 15:05:54

ansaurus

tags:

views:

answers:

Using solr for indexing different types of data

related questions