lucene

Lucene Score results

In Lucene if you had an multiple indexes that covered only one partition each. Why does the same search on different indexes return results with different scores? The results from the different servers match exactly. i.e If I searched for: Name - John Smith DOB - 11/11/1934 Partition 0 would return a score of 0.345 Partition 1 would ...

Lucene exact ordering

I've had this long term issue in not quite understanding how to implement a decent Lucene sort or ranking. Say I have a list of cities and their populations. If someone searches "new" or "london" I want the list of prefix matches ordered by population, and I have that working with a prefix search and an sort by field reversed, where the...

With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

I've had an app doing prefix searches for a while. Recently the index size was increased and it turned out that some prefixes were too darned numerous for lucene to handle. It kept throwing me a Too Many Clauses error, which was very frustrating as I kept looking at my JARs and confirming that none of the included code actually used a b...

WildcardQuery error in Solr

I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character. HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character i...

Using Lucene to search for email addresses

I want to use Lucene (in particular, Lucene.NET) to search for email address domains. E.g. I want to search for "@gmail.com" to find all emails sent to a gmail address. Running a Lucene query for "*@gmail.com" results in an error, asterisks cannot be at the start of queries. Running a query for "@gmail.com" doesn't return any matches, ...

How to get facet ranges in solr results?

Assume that I have a field called price for the documents in Solr and I have that field faceted. I want to get the facets as ranges of values (eg: 0-100, 100-500, 500-1000, etc). How to do it? I can specify the ranges beforehand, but I also want to know whether it is possible to calculate the ranges (say for 5 values) automatically base...

Strategies for keeping a Lucene Index up to date with domain model changes

Was looking to get peoples thoughts on keeping a Lucene index up to date as changes are made to the domain model objects of an application. The application in question is a Java/J2EE based web app that uses Hibernate. The way I currently have things working is that the Hibernate mapped model objects all implement a common "Indexable"...

Best full text search alternative to ms sql, c++ solution

What is the best full text search alternative to ms sql? (which works with ms sql) I'm looking for something similar to Lucene and Lucene.NET but without the .NET and Java requirements. I would also like to find a solution that is usable in commercial applications. ...

Troubleshoot Java Lucene ignoring Field

We're currently using Lucene 2.1.0 for our site search and we've hit a difficult problem: one of our index fields is being ignored during a targeted search. Here is the code for adding the field to a document in our index: // Add market_local to index contactDocument.add( new Field( "market_local" , StringUtils.objec...

In Lucene how do terms get used in calculating scores, can I override it with a CustomScoreQuery?

Has someone successfully overridden the scoring of documents in a query so that the "relevancy" of a term to the field contents can be determined through one's own function? If so, was it by implementing a CustomScoreQuery and overriding the customScore(int, float, float)? I cannot seem to find a way to build either a custom sort or a cu...

Analyzer for Russian language in Lucene and Lucene.Net

Lucene has quite poor support for Russian language. RussianAnalyzer (part of lucene-contrib) is of very low quality. RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead. Do you know any better solut...

How do I estimate the size of a Lucene index?

Is there a known math formula that I can use to estimate the size of a new Lucene index? I know how many fields I want to have indexed, and the size of each field. And, I know how many items will be indexed. So, once these are processed by Lucene, how does it translate into bytes? ...

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

I've found how to sort query results by a given field in a Lucene.Net index instead of by score; all it takes is a field that is indexed but not tokenized. However, what I haven't been able to figure out is how to sort that field while ignoring stop words such as "a" and "the", so that the following book titles, for example, would sort i...

Is there a fast, accurate Highlighter for Lucene?

I've been using the (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this isn't really very accurate when it comes to matching the correct terms in search results - it works well for simple queries, for example searching for two separate words will highlight both code fragments in the results. However, it d...

How to best search against a DB with Lucene?

I am looking into mechanisms for better search capabilities against our database. It is currently a huge bottleneck (causing long-lasting queries that are hurting our database performance). My boss wanted me to look into Solr, but on closer inspection, it seems we actually want some kind of DB integration mechanism with Lucene itself. ...

What is the best search approach using Lucene?

I'm using lucene in my project. Here is my question: should I use lucene to replace the whole search module which has been implemented with sql using a large number of like statement and accurate search by id or sth, or should I just use lucene in fuzzy search(i mean full text search)? ...

Should an index be optimised after incremental indexes in Lucene?

We run full re-indexes every 7 days (i.e. creating the index from scratch) on our Lucene index and incremental indexes every 2 hours or so. Our index has around 700,000 documents and a full index takes around 17 hours (which isn't a problem). When we do incremental indexes, we only index content that has changed in the past two hours, s...

How to do query auto-completion/suggestions in Lucene?

I'm looking for a way to do query auto-completion/suggestions in Lucene. I've Googled around a bit and played around a bit, but all of the examples I've seen seem to be setting up filters in Solr. We don't use Solr and aren't planning to move to using Solr in the near future, and Solr is obviously just wrapping around Lucene anyway, so I...

Which search technology to use with ASP.NET?

What's your preferred method of providing a search facility on a website? Currently I prefer to use Lucene.net over Indexing Service / SQL Server full-text search (as there's nothing to set up server-side), but what other ways are being used out there? ...

Using Lucene to count results in categories

I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for the same search term with every Category to get the number of results per category. This...