views:

3869

answers:

3

I am having trouble searching for an exact phrase using Lucene.NET 2.0.0.4

For example I am searching for "scope attribute sets the variable" (including quotes) but receive no matches, I have confirmed 100% that the phrase exists.

Can anyone suggest where I am going wrong? Is this even supported with Lucene.NET? As usual the API documentation is not too helpful and a few CodeProject articles I've read don't specifically touch on this.

Using the following code to create the index:

    Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", true);

    Analyzer analyzer = new Lucene.Net.Analysis.SimpleAnalyzer();

    IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer,true);

    //create a document, add in a single field
    Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();

    Lucene.Net.Documents.Field fldContent = 
       new Lucene.Net.Documents.Field("content", File.ReadAllText(@"Documents\100.txt"),
        Lucene.Net.Documents.Field.Store.YES,
        Lucene.Net.Documents.Field.Index.TOKENIZED);

    doc.Add(fldContent);

    //write the document to the index
    indexWriter.AddDocument(doc);

I then search for a phrase using:

    //state the file location of the index

    Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false);

    //create an index searcher that will perform the search
    IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);

    QueryParser qp = new QueryParser("content", new SimpleAnalyzer());

    // txtSearch.Text  Contains a phrase such as "this is a phrase" 
    Query q=qp.Parse(txtSearch.Text);  


    //execute the query
    Lucene.Net.Search.Hits hits = searcher.Search(q);

The target document is about 7 MB plain text.

I have seen this previous question however I don't want a proximity search, just an exact phrase search.

+2  A: 

You have not enabled the term positions. Creating field as follows should solve your problem.

    Lucene.Net.Documents.Field fldContent = 
           new Lucene.Net.Documents.Field("content", File.ReadAllText(@"Documents\100.txt"),
            Lucene.Net.Documents.Field.Store.YES,
            Lucene.Net.Documents.Field.Index.TOKENIZED, 
Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);
Shashikant Kore
+7  A: 

Shashikant is correct, you need to enable term positions...

however, I would recommend not storing the text of the document in the field unless you absolutely need it to return back to you in the search results... Setting the store to 'NO' might help reduce the size of your index a bit.

Lucene.Net.Documents.Field fldContent = 
   new Lucene.Net.Documents.Field("content", File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.NO,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

I would have added this as a comment, but my rep is still not high enough.

josefresno
A: 

i want to index content of a group of files present in a directory using lucene.net. so can any one help me

mahesh
is that a question or an answer ?
mP