views:

139

answers:

2

I have a database with a column I wish to index that has comma-delimited names, e.g.,

User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley"

I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this?

  1. Did I miss a simple option to set the tokenize delimiter?
  2. Do I have to subclass or write my own class that to roll my own tokenizer?
  3. Something else? ;)

Or does Lucene.net not support phrases?

Or is it smart enough to handle this use case automatically?

I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.

* EDIT: using my example, I want to store these name phrases in a single field:

Helen Ready

Phil Collins

Brad Paisley

NOT these individual words:

Helen

Ready

Phil

Collins

Brad

Paisley

+1  A: 

Edit: Having read your clarification, here is hopefully a more relevant answer:

  1. You did not miss an option to modify the separator character.
  2. You do need to roll your own tokenizer. I suggest you subclass CharTokenizer. You need to define isTokenChar() according to your spec, meaning that anything but a comma is a token char.
Yuval F
Yuval, I want to index three full names rather than six individual words in a single field. I clarified my question and example above.
Pete Alvin
Pete, please see the new version of my answer.
Yuval F
A: 

You can split the string by comma yourself, and either --

  • Index each name using the Keyword analyzer (non-tokenized)
  • OR index each name using the standard analyzer, and wrap your searches in quotes. Make sure to index a dummy term in between each name so that "Ready Phil" doesn't match the document
bajafresh4life