lucene

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters searchable.

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters(underscore,!,@,#,....) searchable. I set IndexField to NOT_ANALYZED_NO_NORMS and Field.Store.YES When I look at my index in LUKE, the fields are as I expected, a value such as: 'SKU Number', yet when I search for 'SKU' or 'SKU*' nothing come...

How do i search 'and' with lucene?

I am looking at the query syntax. and i could not figure out how to search 'and'. I tried "a sentence with and and words after it" i tried +and and \and. It always ignored it. How can i search 'and'? I am using lucene.net ...

Is MongoDB a valid alternative to relational db + lucene?

On a new project I need a hard use of lucene for a searcher implementation. This searcher will be a very important (and big) piece of the project. Is valid or convenient replacing Relational Database + Lucene with MongoDb? edit: Ok, I will clarify: I'm not asking about risk, I can pay that price in this project. My point is: Is MongoDB ...

Highlighting in Solr 1.4 - requireFieldMatch

I have an object Title : foo Summary : foo bar Body : this is a published story about a foo and a bar All three are set up as fields with stored=true. The user searches across my system for the word "foo" I would like to highlight foo in all three places. The user searches for the word foo in the title "title:foo" I o...

Lucene.NET (strings fuzzy matching)

Good day The question is: Could anyone give me an example about how to do fuzzy matching of two strings using Lucene.NET (or using Java version of Lucene, or in any other language that has port of Lucene). ...

Tokenizing Twitter Posts in Lucene

Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but it does other useful stuff like keeping d...

Lucene numDocs and doqFreq on custom similarity class

Hi All, im doing an aplication with Lucene (im a noob with it) and im facing some problems. My aplication uses the Lucene 2.4.0 library with a custom similaraty implementation (the jar is imported) In my app im calculating doqFreq and numDocs manually (im adding the values of all indexes and then i calculate a global value in order to u...

Lucene.NET - What is the Version parameter in MultiFieldQueryParser constructor?

We're running into a serious bug with the Lucene.NET 2.3 codebase. We're upgrading to Lucene 2.9 in hopes the bug is fixed. Upgrading to the latest version, we see that the MultiFieldQueryParser contructor is [Obsolete]: [Obsolete("Use the ctor with Version param instead.")] public MultiFieldQueryParser(string[] fields, Analyzer analyz...

How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?

Also I want to know how to add meta data while indexing so that i can boost some parameters ...

Couple o' quick questions on Apache Lucene

-- I don't want to start any religious wars, but a quick google search indicates that Apache Lucene is the preferred open source tool for indexing and searching. Are there others? -- What file format does Lucene use to store its index file(s)? Thank is advance. Doug ...

Building dictionary of words from large text

I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive E...

Lucene Fuzzy Match on Phrase instead of Single Word

I'm trying to do a fuzzy match on the Phrase "Grand Prarie" (deliberately misspelled) using Apache Lucene. Part of my issue is that the ~ operator only does fuzzy matches on single word terms and behaves as a proximity match for phrases. Is there a way to do a fuzzy match on a phrase with lucene? ...

Hyphens in Lucene

Hi, I'm playing around with Lucene and noticed that the use of a hyphen (e.g. "semi-final") will result in two words ("semi" and "final" in the index. How is this supposed to match if the users searches for "semifinal", in one word? Edit: I'm just playing around with the StandardTokenizer class actually, maybe that is why? Am I missi...

How does lucene index documents?

Hello, I read some document about Lucene; also I read the document in this link (http://lucene.sourceforge.net/talks/pisa). I don't really understand how Lucene indexes documents and don't understand which algorithms Lucene uses for indexing? On the above link, it says Lucene uses this algorithm for indexing: incremental algorithm: ...

Zend_Search_Lucene and range search

I have a bunch of int key fields in my index and trying to do a simple range search like this: `gender:1 AND height:[120 TO 180]` This should give me male in the height range 120 to 180. But for some reason i get this exception: `At least one range query boundary term must be non-empty term` How would i debug this? Is it just Zend_...

ASP.NET Lucene Performance Improvements question

I have coded up an ASP.NET website and running on win'08 (remotely hosted). The application queries 11 very large Lucene indexes (each ~100GB). I open IndexSearchers on Page_load() and keep them open for the duration of the user session. My questions: The queries take a ~5 seconds to complete - understandable these are very large inde...

What is the advantage of Lucene searching and indexing ?

I want to know , What is the advantage of Lucene searching and indexing ? Is searching with Lucene as fast as other searching algorithm like Quick Search? What about indexing ? I want to know more about advantage of Lucene rather that others . thanks . ...

Why Lucene merge indexes ?

I want to know why Lucene merge indexes ? It's better to say , why does not Lucene merge all indexes to one index ? What is the advantage of this merging method ? ...

How can I search on a list of values using Solr/Lucene?

Given the following query: (field:value1 OR field:value2 OR field:value3 OR ... OR field:value50) Can this be broken down into something less verbose? Basically I have hundreds of category IDs, and I need to search for items under large groups of category IDs (20-50 at a time). In MySQL, I'd just use field IN(value1, value2, value3)...

How do i get a document index so i can delete with lucene?

Basically i am doing this I think i'll set the document id as the thread id on my site (even if some types of thread wont be searched). So i can search by thread id but i am clue less of how to delete. I found pages that say use the document index and i need to optimize or close before changes take effect but i dont know how to get the ...