views:

78

answers:

3

Each Lucene doc is a recipe, and each of these recipes have ingredients.

Im working towards being able to search the ingredients and give a result that says two ingredients matched out of four. (for example)

So how can I add the ingredients to the doc? In solr I can just create multiple fields of and it would save them all, I might be doing something wrong because its only saving the one ingredient.

Also this would apply to a field like 'tags'.

p.s Im using the Zend Framework for this, if it matters at all.

A: 

I see two possible approaches here:

  1. Denormalize your data - create a separate document for each ingredient in a recipe, giving all of the documents for a recipe a common recipe id. Then, during search, aggregate all matches of a recipe id. A bit ugly.
  2. Concatenate all your ingredients into a common field, and index it as 'Text'. Then search for ingredients using a boolean query with 'OR' (This is called 'Should' in Java Lucene terms, I do not know the PHP equivalent).

I suggest you try the second approach and see if it helps.

Yuval F
+1  A: 

Lucene documents support the addition of multiple fields of the same name. i.e. you can repeatedly call:

document.add(new Field("name"), value) 

So were you to do :

# (pseudo-code) 
document1.add(new Field("ingredient"), "vanilla") 
document1.add(new Field("ingredient"), "strawberry") 
index.add(document)

# And then search for
index.search("ingredient", "vanilla" && "strawberry")

You will get back document1. But if you search for:

index.search("ingredient", "vanilla" && "apple")

You will not get back document1.

If you searched for:

index.search("ingredient", "vanilla" || "apple")

You would also get back document1.

If you want to see which ingredients match you can simply save the fields on the document as Stored fields, and then for each matching document retrieve the list of fields and compare them to the user query.

Also note, by default the PositionIncrementGap for fields with the same name that are added to a document is 0.

This means that if you added:

   document1.add(new Field("ingredient"), "chocolate") 
   document1.add(new Field("ingredient"), "orange") 

then it would be treated as if it were a single ingredient called "chocolate orange" which might match on :

index.search("ingredient", "chocolate orange")

You can avoid this set a value for PositionIncrementGap > 1, which will yield:

0 matches for:

index.search("ingredient", "chocolate orange")

and 1 match for:

index.search("ingredient", "chocolate" &&  "orange")
Joel
A: 

Ive decided to use Solr instead, it was less fidley and is much, much faster too. Thanks guys for the help.

bluedaniel
Please use comments for this sort of thing (or edit your question)
Yacoby
Solr is built on Lucene, so how can it be any faster than well written code that interfaces directly with luence?
Joel
im not versed enough in java to be able to answer that, Im just (incorrectly maybe) quoting various other forums Ive been reading up about it on.
bluedaniel