lucene

Lucene - open a closed IndexWriter

Hi everyone, Heres my issue, I perform add() to add documents to my index and then I close() it. That works great! Now I have a new requirement and every time I save something in my DB I need to update my Index. I can't create again the indexWriter because it takes more than 4 minutes so I just need to update() or add() a document to t...

Optimal lucene query options for doing auto completion

I have lucene acting as my data provider for querying a list of countries to do auto completion from a text box which works fine. My question is in regards what type of query string should I be sending over to get the most expected return results? Currently I have something along the lines of var query = string.Format("*{0}*~0.5", txt...

Lucene Standard Analyzer vs Snowball

Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball...

How to optimize Lucene.Net indexing

I need to index around 10GB of data. Each of my "documents" is pretty small, think basic info about a product, about 20 fields of data, most only a few words. Only 1 column is indexed, the rest are stored. I'm grabbing the data from text files, so that part is pretty fast. Current indexing speed is only about 40mb per hour. I've hea...

How to maintain lucene indexes in azure cloud-app

Hi, I just started playing with the Azure Library for Lucene.NET (http://code.msdn.microsoft.com/AzureDirectory). Until now, I was using my own custom code for writing lucene indexes on the azure blob. So, I was copying the blob to localstorage of the azure web/worker role and reading/writing docs to the index. I was using my custom loc...

Indexing Lucene with Parallel Extensions (TPL)

I'd like to speed-up the indexing of 10GB of data into a Lucene index. Would TPL be a good way to do this? Would I need to divided the data up into chunks and then have each thread start indexing chunks? To keep the UI responsive would BackgroundWorker be the best approach, or Task, or something else? Does SOLR already do something l...

How can to group lucene's results?

My application indexes discussion threads. Each entry in the discussion is indexed as a separate Lucene document with a common_id field which can be used to group search hits into one discussion. Currently when the search is performed, if a thread has 3 entries, then 3 separate hits are returned. Even though this is correct, from the us...

How to define a boost factor to each term in each document during indexing?

I want to insert another score factor in Lucene's similarity equation. The problem is that I can't just override Similarity class, as it is unaware of the document and terms it is computing scores. For example, in a document with the text below: The cat is in the top of the tree, and he is going to stay there. I have an algorithm of ...

How do I get solr term frequency?

hi All I have a question that how could somebody get term frequency as we do get in lucene by the following method DocFreq(new Term("Field", "value")); using solr/solrnet. ...

Choosing a solr/lucene commit strategy

I have 120k db records to commit into a Solr index. My question is: should I commit after submitting every 10k records, or only commit once after submitting all the 120k records? Is there any difference between these two options? ...

is it mandatory to optimize the lucene index after write?

Hi, Currently i am calling the optimize method of the indexwriter after the completions of the write. Since my data set is huge, it took long time ( and needs more space (2*actual size)) to optimize the index. I am very much concerned about this because lot of documents included frequently in the index. So is it ok to turn off opt...

how to integrate RAMDirectory into FSDirectory in lucene

I had a question now, this one regarding lucene. I was trying to make a lucene source code that can do indexing and store them first in a memory using RAMDirectory and then flush this index in a memory into a disk using FSDirectory. I had done some modifications of this code but to no avail. maybe some of you can help me out a bit. so w...

searching in solr for specific values with dismax

Hi, I'm using the dismax handler to perform solr search over records (boosting some fields). In my index, I have a RetailerId for each document, as well as other fields. My query needs to search for documents that have this RetailerId as well as keywords: localhost:8983/solr/select?qt=dismax&q=RetailerId:(27 OR 92) AND socks What is...

Lucene: search within search using FuzzyQuery

I need to make a FuzzyQuery using an index that contains around 8 million lines. That kind of query is pretty slow, needing about 20 seconds for every match. The fact is that I can narrow down the results using another field to about 5000 hits before doing the fuzzy search. For this to work, I should be able to make a search by the "narr...

How do I setup Lucene so that I can search ignoring whitespace characters?

For example, a list of part numbers includes: JRB-1000 JRB 1000 JRB1000 JRB100-0 -JRB1000 If a user searches on 'JRB1000', or 'JRB 1000' I would like to return a match for all the part numbers above. ...

List of "tokens" on Lucene 3

Hi there, I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject). In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is...

Search by field in Lucene

Although being a total newbie, may be this question is pretty naive. I want to search my index based on the index. So I tried created a document with just one index, Name, and then want to search for that particular field. I am doing this in process of trying to find out if I can update the fields of a document without actually deleti...

"boosting" different instances of the same field in a lucene document

I want to use a single field to index the document's title and body, in an effort to improve performance. The idea was to do something like this: Field title = new Field("text", "alpha bravo charlie", Field.Store.NO, Field.Index.ANALYZED); title.setBoost(3) Field body = new Field("text", "delta echo foxtrot", Field.Store.NO, Field.Ind...

Example using WikipediaTokenizer in Lucene

Hi, I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end...

Question related to phrase search in lucene/solr?

hi all I have question is it possible to perform a phrase search with wild cards in solr/lucene as if i have two queries both have exactly same results one is +Contents:"change market" and other is +Contents:"chnage* market" but i think the second should match "chages market" as well but it does not matches it. Any help would be...