views:

286

answers:

1

Here's the setup, I have a Lucene Index and it works well with the 2,000 documents I have indexed. I have been using Luke (Lucene Index Toolbox, v.0.9.2) to debug queries, and am using ZF 1.9.

The layout for my Lucene Index is as follows:

I = Indexed
T = Tokenized
S = Stored

Fields:
author - ITS
category - ITS
publication - ITS
publicationdate - IS
summary - ITS
title - ITS

Basically I have a form that is searchable by the above fields, letting you mix and match any of the above information, and will parse it into a zend luceue query. That is not the problem, the problem is when I start combining terms, the "optimize" method that fires within the find causes the query to just disappear.

Here is an example search I am running right now:

Form Version:

Title: test title
Publication: publication name

Lucene Query Parse:

+(title:test title) +(publication:publication name)

Now if I take this query string, and slap it into LUKE, and hit "Search", it returns the results just fine. When I use the Query Find method, it bombs out. So I did a little research into how it functions and found a problem (I believe)

First off, heres the actual lines of code that does the searching:

$searchQuery = "+(title:test title) +(publication:publication name)";
$hits = new ArrayObject($this->index->find($searchQuery));

It's a simplified version of the actual code, but thats what it generates.

Now heres what I've noticed after some debugging, the "optimize" method just destroys the query itself. I created the following code:

$rewrite = $searchQuery->rewrite($this->index);
$optimize = $searchQuery->rewrite($this->index)->optimize($this->index); 
echo "======<br/>";
echo "Original: ".$searchQuery."<br/>";
echo "Rewrite: ".$rewrite."<br/>";
echo "Optimized + Rewrite: ".$optimize."<br/>";
echo "======<br/>";

Which outputs the following text:

======
Original: +(title:test title) +(publication:publication name)
Rewrite: +(title:test title) +(publication:publication name)
Optimized + Rewrite: 
======

Notice how the 3rd output is completely empty. It appears that the Rewrite & Optimize on the query is causing the query string to just empty itself.

Does anyone have any idea why the optimize method seems to just be removing my query all together? Am I missing a filter or some sort of interface that might need to be parsed? All of the queries work perfectly when I paste them into LUKE and run them against the index by hand, but something silly is going on with the way Zend is parsing the query to do the search.

Any help is appreciated.

+5  A: 

I will be quite frank, Zend_Search_Lucene (ZSL) is buggy and not maintained since a long time now.

It is also conceptually wrong. Let me explain why: Search engines are there to reply fast to search queries, the problem with ZSL is that it is implemented in pure PHP. It means that at every query, all indexes files are read and reloaded again, continuously. It can't be fast.

There is nothing wrong with Lucene itself, there is even a very good alternative named Solr which is based on Lucene: it is a search server implemented in Java which can index and reply to all your Lucene queries. Because of the server nature of Solr, you don't suffer of poor performance by reloading all the Lucene files again and again.

This is somewhat different that what you asked, I waited two years for my ZSL bugs to be solved, it's now the case using Solr :)

Patrick Allaert
after fighting zend lucene a lot recently, got to +1 this!
benlumley