lucene

Why does my unthrottled tomcat/solr performance look like it's being throttled?

I've been working on benchmarking our solr response times in relation to the a variable number of concurrent queries. With maxThreads=150 - I've tried running between 20-100 queries concurrently against our solr instance and have noted that for all n-way (>20) queries I'm finding that performance flatlines at 20-30 requests/second. ...

Querying Solr without specifying field names

I'm new to using Solr, and I must be missing something. I didn't touch much in the example schema yet, and I imported some sample data. I also set up LocalSolr, and that seems to be working well. My issue is just with querying Solr in general. I have a document where the "name" field is set to "tom." I keep looking at the config fil...

use compass-lucene as caching technique

Any example of scenarios other than doing search for which I could use "compass"? Lets say we have a page that list top 10 most view article. How to use compass to show this kind of results. Any demo/sample project on this to refer to? definitely Jira would be a good example but its source code is not available. I want to know how to ma...

locking a lucene folder

I am writing a wrapper around Zend's lucene implementation and wanted to add a function rebuildIndex() which reads all relevant fields from the database and re-creates the index file in a temporary folder. When the operation is finished, I want to replace the original folder with the new one. How can I lock the original lucene folder whi...

Lucene Index | Problem with words having apostrophe!

When I do search for the words like Ballantine's, the index gives me the documents that have "'s" only as few search result. I would like to see only those documents which have the full word Ballantine's as it is in the document. How could I change my Searching query? Changing index is very diificult for me now. As I've already indexed...

Hosted full text search solutions?

Does anyone know of companies offering SaaS full text search? I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents to index, and running searches. I could build my own EC2 AMI, but I'd have to configure EBS and other stuff, monitor it, etc. Curious if someone has ...

Building search to the website

I have a website which has about 200 to 300 static public pages. I am required to bring about some kind of search functionality on the website which will search all of its public pages. I don't want to use external tools like Google site search, etc. Is there a tool or open source code that will index the content and then display the sea...

How to write a Lucene query that returns all words containing the letter "t" ?

I tried this Lucene code example, which worked: http://snippets.dzone.com/posts/show/8965 However changing: Query query = parser.parse("st."); to Query query = parser.parse("t"); returned zero hits. How to write a Lucene query that returns all words containing the letter "t" ? (max nbr of hits to return = 20) ...

Hibernate Search with index in a different database

I have a database which is readonly (I only have the access to view), but I have to index this database for search. The DAO layer to this table is now using a generic DAO approach with Hibernate+JPA. Is it possible to add hibernate search to this view and store the index in a separate database? I am aware that I may lose the capability ...

Very basic dude with Solr/Lucene

Hello, I am working in a project that has a big amount of data in Lucene. We need to show a faceted search and the time requiered for it is unacceptable when trying to simulate it using regular Lucene accesss. I have been reading about Solr, but tutorials are not very clear about this basic point: Is the data stored in the same way usin...

how to install lucene 3.0.0 in ubuntu 8.10

hi, I have downloaded lucene3.0.0 and when i used cmd java -jar lucene-core-3.0.0.jar in the directory where lucene is present i got this msg Failed to load Main-Class manifest attribute from lucene-core-3.0.0.jar how do i proceed? please help me. thanks in advance, ...

string matching algorithms used by lucene

i want to know the string matching algorithms used by Apache Lucene. i have been going through the index file format used by lucene given here. it seems that lucene stores all words occurring in the text as is with their frequency of occurrence in each document. but as far as i know that for efficient string matching it would need to pre...

hit highlighting in lucene

i am searching for strings indexed in lucene as documents. now i give it a long string to match. example: "iamrohitbanga is a stackoverflow user" search string documents: document 1: field value: rohit document 2: field value: banga now i use fuzzy matching to find the search strings in the documents. the 2 documents match. i want...

Spatial lucene simple query doesn't work?

Hi, does anyone have any experiences using the lucene's spatial search component (lucene 3.0)? I tried a very simple example but could not get the search to return anything, see below for all the codes import java.io.IOException; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import o...

Lucene query permutation

I have a question regarding performing a lucene query involving permutation. Say I have two fields: "name" and "keyword" and the user searches for "joes pizza restaurant". I want some part of that search to match the full contents of the "name" field and some part to match the full content of the keyword field. It should match all the...

ToTitleCase in solr to stop SCREAMING CAPS in Solr

Hi, I'm using solr's faceting and i've run into a problem that i was hoping i could get around using filters. Basically some times a town name will come through to SOLR as "CAMBRIDGE" and sometime's it will come through as "Cambridge" I wanted to use a filter in Solr to stop the SCREAMING CAPS version of the town name. It seems th...

How do I detect if there is already a similar document stored in Lucene index.

Hi. I need to exclude duplicates in my database. The problem is that duplicates are not considered exact match but rather similar documents. For this purpose I decided to use FuzzyQuery like follows: var fuzzyQuery = new global::Lucene.Net.Search.FuzzyQuery( new Term("text", queryText), 0.8f, ...

Grails Searchable Plugin - termFreqs for a subset of domain instances?

I'm trying to utilize the termFreqs method provided by the Searchable plugin to generate a keyword cloud for the most popular terms found in an indexed domain class. The problem is, I only want to get the term frequencies for a subset of the records in the database. I have the following classes: class Text { String title String cont...

What is the real difference between INDEX.TOKENIZER vs INDEX.ANALYZER in Lucene 2.9?

With lucene 2.9.1, INDEX.TOKENIZED is deprecated. The documentation says it is just renamed to ANALYZER, but I don't think the meaning has stayed the same. I have an existing app based on 2.3 and I'm upgrading to 2.9, but the expected behavior seems to have changed. Anyone know any more details about INDEX.TOKENIZER vs INDEX.ANALYZER?...

Lucene search where a field MUST start with certain letters

I'm trying to search for results within a range e.g. A TO C. However the results are coming in with results that contain letters within the range but I only want results that START with letters within the range. ...