tags:

views:

1374

answers:

1

I think I'm still not understanding the lucene indexing options.

The following options are

  • Store.Yes
  • Store.No

and

  • Index.Tokenized
  • Index.Un_Tokenized
  • Index.No
  • Index.No_Norms

I don't really understand the store option. Why would you ever want to NOT store your field?
Tokenizing is splitting up the content and removing the noise words/separators (like "and", "or" etc)
I don't have a clue what norms could be. How are tokenized values stored?
What happens if i store a value "my string" in "fieldName"? Why doesn't a query

fieldName:my string

return anything?

+17  A: 

Store.Yes

Means that the value of the field will be stored in the index

Store.No

Means that the value of the field will NOT be stored in the index

Store.Yes/No does not affect the indexing or searching with lucene. It just tells lucene if you want it to act as a datastore for the values in the field. If you use Store.Yes, then when you search, the value of that field will be included in your search result Documents.

If you're storing your data in a database and only using the Lucene index for searching, then you can get away with Store.No on all of your fields. However, if you're using the index as storage as well, then you'll want Store.Yes.

Index.Tokenized

Means that the field will be tokenized when it's indexed (you got that one). This is useful for long fields with multiple words.

Index.Un_Tokenized

Means that the field will not be analyzed and will be stored as a single value. This is useful for keyword/single-word and some short multi-word fields.

Index.No

Exactly what it says. The field will not be indexed and therefore unsearchable. However, you can use Index.No along with Store.Yes to store a value that you don't want to be searchable.

Index.No_Norms

Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.

For further reading, the lucene javadocs are priceless:

For your last question, about why your query's not returning anything, without knowing anymore about how you're indexing that field, I'd say that it's because your fieldName qualifier is only attached to the 'my' string. To do the search for the phrase "my string" you want:

fieldName:"my string"

A search for both the words "my" and "string" in the fieldName field:

fieldName:(my string)

dustyburwell
Thanks, that clears up a thing or two. Still not sure what I'm doing wrong with my indexing/searching though. But now I got a better view at what I'm doing.
borisCallens
Are you using 2.4.1? Because those Field.Index values have been deprecated in favor of new names which are a bit clearer, IMO. See http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/document/Field.Index.html.
Jegschemesch
Well, I (and the OP if I'm not mistaken) have been using Lucene.Net which is quite a bit behind. I don't recall which version the port is equivalent to at this point, but those are the values that it has available.
dustyburwell
As far as I know the Lucene.net version numbers match the Lucene version they're ported from
Nick
With lucene 2.9.1, INDEX.TOKENIZED is deprecated. The documentation says it is just renamed to ANALYZER, but I don't think the meaning has stayed the same. Anyone know any more details about INDEX.ANALYZER?
Flynn81
The Field.Index.ANALYZED does tokenization, which is the reason Field.Index.TOKENIZED now refers to it.
Steen