views:

608

answers:

1

Hi.

I'm considering the use of Apache solr for indexing data in a new project. The data is made of different, independent types, which means there are for example

  • botanicals
  • animals
  • cars
  • computers

to index. Should I be using different indexes for each of the types or does it make more sense to use only one index? How does using many indexes affect performance? Or is there any other possibility to achieve this?

Thanks.

+4  A: 

Both are legitimate approaches, but there are tradeoffs. First, how big is your dataset? If it is large enough that you may want to partition it across multiple servers, it probably makes sense to have different indexes.

Second, how important is performance - indexing it all together will likely result in worse performance, but the degree depends on how much data there is and how complex the queries can get.

Third, do you have the need to query for multiple data types in the same search? If so, indexing everything together can be a convenient way to allow this. Technically this could be achieved with separate indexes, but getting the most relevant results for the query could be a challenge (not that it isn't already)

Fourth, a single index with a single schema and configuration can simplify the life of whoever will be deploying and maintaining the system.

One other thing to consider is IDs - do the all of the different objects have a unique identifier across all types? If not, you probably will need to generate this if you want to index them together.

KenE
Thanks for your answer. I guess, I really have to stick with multiple indexes since the generation of unique identifiers in one index would be a mess in my case. I played around with solr index distribution and using shards, but they apparently were made for speeding up queries on huge datasets. I think five or even more cores isn't the way of use it is supposed to be. So my current thoughts are going towards just using Lucene without solr.
Markus Lux