views:

291

answers:

2

So why does the first search example below return no results? And any ideas on how to modify the below code to make number searches possible would be much appreciated.

Create the index

$index = new Zend_Search_Lucene('/myindex', true);
$doc->addField(Zend_Search_Lucene_Field::Text('ssn', '123-12-1234'));
$doc->addField(Zend_Search_Lucene_Field::Text('cats', 'Fluffy'));
$index->addDocument($doc);
$index->commit();

Search - NO RESULTS

$index = new Zend_Search_Lucene('/myindex', true);
$results = $index->find('123-12-1234');

Search - WITH RESULTS

$index = new Zend_Search_Lucene('/myindex', true);
$results = $index->find('Fluffy');
+1  A: 

This is an effect of which Analyzer you have chosen.

I believe the default Analyzer will only index terms that match /[a-zA-Z]+/. This means that your SSN isn't being added to the index as a term.

Even if you switched to the text+numeric case insensitive Analyzer, what you are wanting still will not work. The expression for a term is /[a-zA-Z0-9]+/ this would mean your terms added to the index would be 12,123,1234.

If you need 123-12-1234 to be seen as a valid term, you are probably going to need to extend Zend_Search_Lucene_Analysis_Analyzer_Common and make it so that 123-12-1234 is a term.

See http://framework.zend.com/manual/en/zend.search.lucene.extending.html#zend.search.lucene.extending.analysis

Your other choice is to store the ssn as a Zend_Search_Lucene_Field::Keyword. Since a keyword is not broken up into terms.

http://framework.zend.com/manual/en/zend.search.lucene.html#zend.search.lucene.index-creation.understanding-field-types

Zoredache
A: 

First you need to change your text analizer to include numbers

Zend_Search_Lucene_Analysis_Analyzer::setDefault( new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum() );

Then for fields with numbers you want to use Zend_Search_Lucene_Field::Keyword instead of Zend_Search_Lucene_Field::Text this will skip the the creation of tokens and saves the value 'as is' into the index. Then you can search by it. I don't know how it behaves with floats ( is probably not going to work for floats 3.0 is not going to match 3) but for natural numbers ( like ids ) works like a charm.

Mon Villalon