lucene

Lucene search - score higher if word or similar are in a Field

Hi, I need to know when a word or words are inside a field in my index, and have that document swith greater score. My problem is that if i search for "Sherton Hotel" I get this as greatest results Petit Hotel Crzy cow Simmonss And i would like this ones to have the greatest results Maui Sheraton Hotel near the moon A fantastic ho...

Solr Highlighting Problem..

Hi All I have a problem that when i Query Solr it matches results, but when i enable highlighting on the results of this query the highlighting does not work.. My Query is +Contents:"item 503" Contents is of type text and one important thing in text item 503 appear as "item 503(c)", can open parenthesis at the end create problem?? pl...

Zend_Search_Lucene, how to share an index storage folder over network

Hi, I am running web application on two different servers with load balancing, and using Zend_Search_Lucene for indexing documents. Now I am facing indexing issue which a user comes to the site through server #1 and stores information, Zend_Search_Lucene stores index only server #1. So once another user comes to the site through serv...

lucene query size- does this scale? query for '1 OR 2 OR 3 .. OR N'

Suppose I have a lucene query 'id1 OR id2 OR id3 ... idN'. How well does that scale as N increases? The situation I'm looking at would be similar to someone doing a text search on products in their shopping cart, but they may have hundreds or thousands of items their shopping cart. The user wants to do a text search across all product...

What is the biggest size / number of documents of index - java lucene 3.0.2 on 32 bit OS

Hi, I am playing around with lucene and 40GB of data (~500M of tuples, 2 fields behaving like key - value). I have created -- a suprise -- a 35 GB index which does not work. Therefore I want to create a set of smaller indicies but, for that, I need information about maximum size. ...

Can we tell Solr/Lucene max chars to analyze for a search?

Hi I have a problem that in my lucene index files one document can have huge text. now when i search one of these huge text documents lucene/solr does not filter any results even the search term exist in the document text. the reason that i think might be the large number of characters in document text? if yes than how could we tell sol...

Search term suggestions

This question has been asked in various ways before, but I'm wondering if people who have experience with automatic search term suggestion could offer advice on the most useful and efficient approaches. Here's the scenario: I'm just starting on a website for a book that is a dictionary of terms (roughly 1,000 entries, with 300 word exp...

Zend Search Lucene HTTP 500 Internal Server Error while bulk indexing on small tables

I am just getting started with Zend Search Lucene and am testing on a GoDaddy shared Linux account. Everything is working - I can create and search Lucene Documents. The problem is when I try to index my whole table for the first time I get a HTTP 500 Internal Server Error after about 30 seconds. If I rewrite my query so that I only s...

Problems with hyphen in Jackrabbit XPath query

Firstly, let me just say that I'm very new to JSR-170 and Jackrabbit/Lucene in general. I have the following XPath query: //*[@sling:resourceType="users/user-profile" and jcr:contains(*/*/*,'sophie\-a')] order by @jcr:score descending I have a user named Sophie-Allen and a user named Sophie-Anne. Searching using the above query retu...

How to count the number of terms for each document in lucene index?

I want to know the number of terms for each document in a lucene index. I've been searching in API and in internet with no result. Can you help me? ...

lucene vs solr scoring

Can some one explain (or quote a reference) to compare the scoring mechanism used by SOLR and LUCENE in simpler words. Is there any difference in them; I am not that good at solr/lucene but my finding showed as if they are different. P.S: i just tries a simple query like "+Contents:risk" and didn't use any filter other stuff. ...

Frequencies of lucene unigrams and bigrams

Hi! i am storing in lucene index ngrams up to level 3. When I am reading the index and calculating scoring of terms and ngrams I am obtaining results like this TERM FREQUENCY.... TFIDF minority 25 16.512926 minority report 24 16.179296 report 27 13.559037 cruise ...

How do I tell lucene to search a complete document?

I have lucene running and I query it via Solr. The indexes are built, I have a document that contains lots of words, now how to I tell lucene that it has to search the index for the document i provide, what would be the query syntax? ...

How to search a complete document in Lucene via Solr?

I have a question regarding searching a complete document. 1 - I have indexed a lot of documents on lucene. 2 - Each document has a single word per line. Suppose 200 words which becomes 200 lines. 3 - I know how to search lucene via Solr but; If suppose that i indexed the document mydoc.txt on lucene containing 200 words along with o...

not query in lucene

Hi, i need to do not queries on my lucene index. Lucene currently allows not only when we have two or more terms in the query: So I can do something like: country:canada not sweden but I can't run a query like: country:not sweden Could you please let me know if there is some efficient solution for this problem Thanks ...

Can I search Solr documents by member of a multi-value field?

I have a set of Solr documents containing (among other fields) multi-value fields with percentage data or -1 if the value is null, e.g. <doc> ... <arr name="alpha"> <float>0.23</float> <float>0.23</float> <float>0.43</float> </arr> <arr name="beta"> <float>0.52</float> <float>-1.0<...

Complex search query in lucene (querying fields which are indexd as numeric, analyzed or not-analyzed using a sinple analyzer)

Hi I am building a search application using lucene. Some of my queries are complex. For example, My documents contain the fields location and population where location is a not-analyzed field and population is a numeric field. Now I need to return all the documents that have location as "san-francisco" and population between 10000 and 20...

Searching hyphenated words with Lucene

Hi I want lucene to search for hyphenated words, for eg: energy-efficient or "energy-efficient" as one single word So if the input is energy-efficient the tokenizer generates terms like energy or efficient or energy efficient or energy-efficient Therefore lucene returns with pages containing both "energy efficient" and "energy-effici...

Please recommend best practice for integrating apache lucene with a MVC/spring web app

I've been working on a web application using Spring/MVC which is coming along nicely. We'd like to now integrate apache lucene to index a lot of the domain objects for a user search facility. I'm undecided if I should create an indexing service that's registered within spring or do it the traditional servlet way and implement a ServletC...

how to configure solr / lucene to perform levenshtein edit distance searching?

i have a long list of words that i put into a very simple SOLR / Lucene database. my goal is to find 'similar' words from the list for single-term queries, where 'similarity' is specifically understood as (damerau) levensthein edit distance. i understand SOLR provides such a distance for spelling suggestions. in my SOLR schema.xml, i ha...