views:

89

answers:

2

I am creating a tagging system for my site

I got the basics of adding a document to lucene but i can seem to figure out how to delete a document or update one when the user changes the tags of something. I found pages that say use the document index and i need to optimize before effect but how do i get the document index? Also i seen another that said to use IndexWriter to delete but i couldnt figure out how to do it with that either.

I am using C# asp.net and i dont have java install on that machine

+2  A: 

You need an IndexReader to delete a document, I'm not sure about the .net version but the Java and C++ versions of the Lucene API have an IndexModifier class that hides the differences between IndexReader and IndexWriter classes and just uses the appropriate one as you call addDocument() and removeDocument().

Also, there is no concept of updating a document in Lucene you have to remove it an them re-add it again. In order to do this you will need to make sure that every document has a unique stored id in the index.

Dan Head
great to know about updates. I dont see removeDocument nor IndexModifier (maybe .net is using an older version of lucene). I see a DeleteDocument in IndexReader. It accepts 'int docNum'. I have no idea what to do with it. Theres no docNum or docId in Document
acidzombie24
the docnum is the enumerator key, for example IndexReader rdr = IndexReader.Open(@"Myindex");int N = rdr.MaxDoc();for(int n = 0; n< N; n++){Document doc = rdr.Document(n);//do something with this doc}
Mikos
+2  A: 

What version of Lucene are you using? The IndexWriter class has an update method that lets you update (BTW an update under the hood is really a delete followed by an add). You will need to have some identifier (such as as document id) which lets you update. When you index the document, add a unique document identifier such as a URL, a counter etc. Then the "Term" will be the ID of the document you wish to update. For example using URL you can update thus:

IndexWriter writer = ...
writer.update(new Term("id","http://somedomain.org/somedoc.htm"), doc); 
Mikos
Incubating-Apache-Lucene.Net-2.0-004-11Mar07.bin.zip. So perhaps lucene 2.0.
acidzombie24
Yikes, i just realized the date. I found an svn tag using Lucene.Net_2_9_1
acidzombie24
Just so i am clear. Doc is the new document filled with the data i want. The term is the id of the old document i want to update/replace? -edit- update looks like a DeleteAdd. doc doesnt need to hold the same id or term as the older one.
acidzombie24
yes to your first point.Unsure what you mean by the 2nd, it is very advisable to have an id term for your doc (much like having a primary key for a db table).
Mikos
Perfect, i just tested it. It does create a doc even if the term doesnt exist. My id will be the same id as the media or doc id (which is a PK in my db)
acidzombie24