views:

384

answers:

1

I need to partition my SOLR index into two halves. As it turns out I already have a STRING field which represents the partitioning info I need. Meaning, one kind of record that would be searched for is where the string field is present in the record and contains a value, the other kind is where it contains no value. All queries must specify the partition to be searched. Now would it be faster for me to search that field using a negative field query, like -strFld:[* TO *] or would it be better/faster for me to create a new boolean field, call it say "hasStrFld", and search on that? Thus when strFld is not present hasStrFld is false. Likewise, when there is data in StrFld then hasStrFld is true.

Other considerations: there is a very low likelihood that in the future the index could be partitioned into a third partition, making boolean unsuitable and negative queries unsuitable, and making a field with enumerated values a consideration. However, the likelihood is pretty low and fast performance today is a much larger consideration than extensibility for tomorrow.

+2  A: 

I think in practice I might consider running some tests before deciding, however, without them I would most likely resort to the STRING field.

  • it is already there
  • there is no such thing as a special boolean field (Solr adds this using a string token as far as I understand).

For speed I would try:

  • using it as a filter (i.e. fq) to make use of the associated caching mechanisms
  • boost/unboost the field score and sort (instead of partitioning completely)
  • a multi core approach, partitions in cores (actually this could be easy to extend and easy to maintain)

Hope this helps.

Dieter