views:

161

answers:

2

EDIT:

I have managed to solve the problem by using:

+"lorem ipsum" +type:photo
+"lorem ipsum" +type:video

Another problem though is that the index is returning correct results but with wrong id (id is a primary key). More specifically, id fields returned are 1 less than real ids (id - 1) in the database which I use to build the index.

That's very strange.


What's wrong with these search queries:

"lorem ipsum" AND +type:photo
"lorem ipsum" AND +type:video

First query is supposed to find only results with type = photo, second one searches only videos. But they are both returning both photos and videos.

Here is how I build the index:

    // create media index
    $index = Zend_Search_Lucene::create('/data/media_index');
    // get all media
    $media = $this->_getTable('Media')->get();
    // iterate through media and build index
    foreach ($media as $m) {

        $doc = new Zend_Search_Lucene_Document();

        $doc->addField(Zend_Search_Lucene_Field::UnIndexed('id',
                                                           $m->id));
        $doc->addField(Zend_Search_Lucene_Field::UnIndexed('thumb_path',
                                                           $m->thumb_path));
        $doc->addField(Zend_Search_Lucene_Field::Keyword('title',
                                                         $m->title));
        $doc->addField(Zend_Search_Lucene_Field::UnStored('description',
                                                          $m->description));
        $doc->addField(Zend_Search_Lucene_Field::Keyword('type',
                                                         $m->type));

        $index->addDocument($doc);

    }
    // commit the index
    $index->commit();

And here is how I search it:

    $index = Zend_Search_Lucene::open('/data/media_index');
    $this->view->photos = $index->find('"lorem ipsum" AND +type:photo');
    $this->view->videos = $index->find('"lorem ipsum" AND +type:video');

Any ideas?

+2  A: 

I just ran some tests on my own search index, and the problem seems to be in the query itself and not the code. The "AND" in the query is an operator, and so is the "+". The query parser seems to be confused by the double operator logic with no term between. This was a block quote I found in their docs:

If the AND/OR/NOT style is used, then an AND or OR operator must be present between all query terms. Each term may also be preceded by NOT operator. The AND operator has higher precedence than the OR operator. This differs from Java Lucene behavior.

Now, running your query through the parser, this was the Search_Query object:

string '"lorem ipsum" AND +type:photo' (length=29)

object(Zend_Search_Lucene_Search_Query_MultiTerm)[230]
  private '_terms' => 
    array
      0 => 
        object(Zend_Search_Lucene_Index_Term)[236]
          public 'field' => null
          public 'text' => string 'lorem' (length=5)
      1 => 
        object(Zend_Search_Lucene_Index_Term)[237]
          public 'field' => null
          public 'text' => string 'ipsum' (length=5)
      2 => 
        object(Zend_Search_Lucene_Index_Term)[238]
          public 'field' => null
          public 'text' => string 'and' (length=3)
      3 => 
        object(Zend_Search_Lucene_Index_Term)[239]
          public 'field' => null
          public 'text' => string 'type' (length=4)
      4 => 
        object(Zend_Search_Lucene_Index_Term)[240]
          public 'field' => null
          public 'text' => string 'photo' (length=5)

Changing the query up a bit, removing the AND or removing the +, and only using 1.

string '"lorem ipsum" +type:photo' (length=25)
string '"lorem ipsum" AND type:photo' (length=28)

object(Zend_Search_Lucene_Search_Query_Boolean)[227]
  private '_subqueries' => 
    array
      0 => 
        object(Zend_Search_Lucene_Search_Query_Phrase)[230]
          private '_terms' => 
            array
              0 => 
                object(Zend_Search_Lucene_Index_Term)[233]
                  public 'field' => null
                  public 'text' => string 'lorem' (length=5)
              1 => 
                object(Zend_Search_Lucene_Index_Term)[234]
                  public 'field' => null
                  public 'text' => string 'ipsum' (length=5)
      1 => 
        object(Zend_Search_Lucene_Search_Query_Term)[235]
          private '_term' => 
            object(Zend_Search_Lucene_Index_Term)[232]
              public 'field' => string 'type' (length=4)
              public 'text' => string 'photo' (length=5)

The only difference: AND:

  private '_signs' => 
    array
      0 => boolean true
      1 => boolean true

+:

  private '_signs' => 
    array
      0 => null
      1 => boolean true

The AND operator requires that both of the search queries are required in the result, where as the + only requires the value on the right be required.

So just change up the query to

"lorem ipsum" AND type:photo

And you should get the results you are looking for.

Jesta
I think that did it. There's one more problem though, the search is returning results starting with id = 0 (id is a primary key) and id in the database starts with 1. It seems like the index is returning id - 1.
Richard Knop
Why is that? I am creating the index correctly as far as I can see.
Richard Knop
I have edited my first post.
Richard Knop
Check out this post:http://stackoverflow.com/questions/674817/zend-framework-lucene-boolean-google-like-searchI didn't realize it either, but it appears ID is a reserved word for lucene. In my link builder I am using SLUGs to link to them, a URL-Encoded version of the title. So like the post below it says, ID must be used by the lucene index already, try just renaming the Keyword in your index to `media_id` or something along those lines.
Jesta
+2  A: 

About hte "id problem" i would guess that "id" is internal variable used to access each result. So I would recommend to rename the field to sth. like "entryId" and then use $resultItem->entryId

Tomáš Fejfar
Yeah that was it :) Thanks
Richard Knop