lucene

Prevent "Too Many Clauses" on lucene query

In my tests I suddenly bumped into a Too Many Clauses exception when trying to get the hits from a boolean query that consisted of a termquery and a wildcard query. I searched around the net and on the found resources they suggest to increase the BooleanQuery.SetMaxClauseCount(). This sounds fishy to me.. To what should I up it? How can...

Faster way to get distinct values from Lucene Query

Currently I do like this: IndexSearcher searcher = new IndexSearcher(lucenePath); Hits hits = searcher.Search(query); Document doc; List<string> companyNames = new List<string>(); for (int i = 0; i < hits.Length(); i++) { doc = hits.Doc(i); companyNames.Add(doc.Get("companyName")); } searcher.Close(); companyNames = companyNam...

Get all lucene values that have a certain fieldName

To solve this problem I created a new Lucene index where all possible distincted values of each field are indexed seperatly. So it's an index with a few thousand docs that have a single Term. I want to extract all the values for a certain term. For example, I would like all values that have the fieldName "companyName". Defining a Wildca...

Lucene Search Error Stack

I am seeing the following error when trying to search using Lucene. (version 1.4.3). Any ideas as to why I could be seeing this and how to fix it? Caused by: java.io.IOException: read past EOF at org.apache.lucene.store.InputStream.refill(InputStream.java:154) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43...

Deleting and updating documents in Lucene index

Hi, Am using Lucene.Net dll version 2.0.0.4 Looks like its IndexWriter class does not have methods for DeleteDocument and UpdateDocument.Am i missing something here?How do i achieve delete,update functionality in this version of dll? Version 2.1 Lucene dll seems to have support for delete and update documents: public virtual void Dele...

How to correctly boost results in Solr Dismax query

I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would ...

using OR and NOT in solr query

I'm working on a solr query similar to the following: ((myField:superneat AND myOtherField:somethingElse) OR NOT myField:superneat) When running this, no results are returned. Using criteria on either side of the OR NOT returns results that I'd expect - they are just not working well together. In the case that myField matches supern...

What are some good resources on using Lucene.Net?

Does anyone know where I can find out more information on Lucene.Net? I am looking for a tutorial or videos on how to use Lucene.Net that stack overflow users can personally recommend. ...

Get field names from a lucene query string

If I have a Lucene query string "field1:value1 myField:aValue" Is there a way to let Lucene parse this so I can get term queries? I ultimately want to be able to get the field names and their values back to my viewdata so I can fill them in my textboxes on post back. ...

Lucene indexing: Store and indexing modes explained

I think I'm still not understanding the lucene indexing options. The following options are Store.Yes Store.No and Index.Tokenized Index.Un_Tokenized Index.No Index.No_Norms I don't really understand the store option. Why would you ever want to NOT store your field? Tokenizing is splitting up the content and removing the noise w...

Using MultiFieldQueryParser

Hi, Am using MultiFieldQueryParser for parsing strings like a.a., b.b., etc But after parsing, its removing the dots in the string. What am i missing here? Thanks. ...

Using Highlighter for highlighting Phrase query

Am using this version of Lucene highlighter.net API. I want to get a phrase highlighted only when ALL of its words are present in the search results..But,am not able to do so....for example, if my input search string is "Leading telecom company", then the API only highlights "telecom" in the results if the result does not contain the wo...

DuplicateFilter for lucene.net?

Today I found this document about the DuplicateFilter. That would be exactly what I need right now, but I can't seem to find it in the .net port. Is it there at all? ...

Indexing token bigrams in Lucene

Hi, I need to index bi-grams of words (tokens) in Lucene. I can produce n-grams and than index them, but I am wondering if there is something in Lucene which will do this for me. I found out that Lucene indexes only n-gram of chars. Any ideas? ...

Problem with Lucene- search not indexing numeric values?

I am using Lucene in PHP (using the Zend Framework implementation). I am having a problem that I cannot search on a field which contains a number. Here is the data in the index: ts | contents --------------+----------------- 1236917100 | dog cat gerbil 1236630752 | cow pig goat 1235680249 | lion tiger bear n...

High level explanation of Similarity Class for Lucene?

Do you know where I can find a high level explanation of Lucene Similarity Class algorithm. I will like to understand it without having to decipher all the math and terms involved with searching and indexing. ...

MultiFieldQueryParser is removing dots from the acronym

Am posting this question again as my query is not answered. Am working on a book search api using Lucene. User can search for a book whose title or description field contains C.F.A... Am using StandardAnalyzer alongwith a list of stop words. Am using MultiFieldQueryParser for parsing above string.But after parsing, its removing the dot...

Lucene query to exclude docs, but not docs with partial matches

Say I have a set of docs that go like this: mercedes mercedes trucks Is there a way to create a query that will filter out the mercedes, but not the mercedes trucks? ...

Lucene search with complex query

Here's what I want to do, using pseudo-code: lucene.Find((someField == "bar" || someField == "baz") && anotherField == "foo"); Or in English, "find all documents where someField is 'bar' or 'baz', and where anotherField is 'foo'". How can I do a query like this with Lucene? ...

Get term frequencies in Lucene

Is there a fast and easy way of getting term frequencies from a Lucene index, without doing it through the TermVectorFrequencies class, since that takes an awful lot of time for large collections? What I mean is, is there something like TermEnum which has not just the document frequency but term frequency as well? UPDATE: Using TermDoc...