I'm using Zend_Search_Lucene to create an index of articles to allow them to be searched on my website. Whenever a administrator updates/creates/deletes an article in the admin area, the index is rebuilt:
$config = Zend_Registry::get("config");
$cache = $config->lucene->cache;
$path = $cache . "/articles";
try
{
$index = Zend_Search_Lucene::open($path);
}
catch (Zend_Search_Lucene_Exception $e)
{
$index = Zend_Search_Lucene::create($path);
}
$model = new Default_Model_Articles();
$select = $model->select();
$articles = $model->fetchAll($select);
foreach ($articles as $article)
{
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text("title", $article->title));
$index->addDocument($doc);
}
$index->commit();
My question is this. Since I am reindexing the articles and handling deleted articles as well, why would I not just use "create" every time (instead of "open" and update)? Using the above method, I think the articles would be added with addDocument every time (so there would be duplicates). How would I prevent that? Is there a way to check if a Document exists already in the index?
Also, I don't think I fully understand how the indexing works when you "open" and update it. It seems to create new #.cfs (so I have _0.cfs, _1.cfs, _2.cfs) files in the index folder every time, but when I use "create", it overwrites that file with a new #.cfs file with the # incremented (so, for example just _2.cfs). Can you please explain what these segmented files are?