lucene

Solr Incremental backup on real-time system with heavy index

Hi to all! I implement search engine with solr that import minimal 2 million doc per day. User must can search on imported doc ASAP (near real-time). I using 2 dedicated Windows x64 with tomcat 6 (Solr shard mode). every server, index about 120 million doc and about 220 GB (total 500 GB). I want to get backup incremental from solr in...

Order by field with SQLite

Hello, I'm actually working on a Symfony project at work and we are using Lucene for our search engine. I was trying to use SQLite in-memory database for unit tests (we are using MySQL) but I stumbled upon something. The search engine part of the project use Lucene indexing. Basically, you query it and you get an ordered list of ids, w...

Storing users' data in lucene or querying rdbms?

Hi,all. I'm struggling with lucene and not sure how it's better to do: i've got users' data for their profiles - some of them(3-4 fields) are storing in lucene.But on query results i need also to show user's age/name/etc. I don't think it's reasonable to save all of these fields(additional, which are not participate in the search proces...

What is faster for radial location based search? Lucene or Sphinx?

I'm part of a development team working on a job board, and we're considering both Lucene and Sphinx for out search base. Does anyone have experience working with either of these open-source tools for location based search? ...

Solr: fieldNorm different per document, with no document boost

I want my search results to order by score, which they are doing, but the score is being calculated improperly. This is to say, not necessarily improperly, but differently than expected and I'm not sure why. My goal is to remove whatever is changing the score. If I perform a search that matches on two objects (where ObjectA is expecte...

Lucene Boolean Query on Not ANalyzed Fields

Using RavenDB to do a query on Lucene Index. This query parses okay: X:[[a]] AND Y:[[b]] AND Z:[[c]] However this query gives me a parse exception: X:[[a]] AND Y:[[b]] AND Z:[[c]] AND P:[[d]] "Lucene.Net.QueryParsers.ParseException: Cannot parse '( AND )': Encountered \" \"AND" I tried this on complexed index and simple reproduce ...

Handling different non-accented versions of Umlaut characters

The German accented Umlaut characters “ö”, “ä” and “ü” are often replaced with non-accented versions when users type, often for convenience when they do not have the correct keyboard. With most accented characters there is a particular non-accented version that most people use. The accented “è”, for instance, is always replaced with a s...

Lucene ChainedFilter vs. BooleanFilter

I want to combine several filters in a Lucene search. It seems that there are two classes that can help with this: ChainedFilter and BooleanFilter. ChainedFilter is a contributed class, and has been around for longer. It supports AND, OR, NOT, and XOR. BooleanFilter is a newer, main-line class. It supports the unusual "Should", "MustN...

is lucene fuzzy search lazy?

I would like to use Lucene's fuzzy search, which I understand is based on some sort of Levenshtein-like algorithm. If I use a fairly high threshold (i.e, "new york~0.9"), will it first compute the edit distance and then see if it is less than whatever 0.9 corresponds to, or will it cut off the algorithm if it becomes apparent that the d...

Lucene return result in RSS

Hi, Does anyone know how to return the Lucene result in RSS or Atom feed? I understand that's the step necessary to return results via OpenSearch. Thanks much Jack ...

Denormalizing relational data for lucene/solr

I have an architectural question about using apache solr/lucene. I'm building a solr index for searching a CV database. Basically every cv on there will have some fields like: rate of pay, address, title these fields are straight forward. The area I need advise on is, skills and job history. For skills, someone might add an entry l...

How to create nested boolean query with lucene API (a AND (b OR c))?

I have an indexed object with three fields (userId, title, description). I want to find all objects of a specific user where the title OR the description contains a given keyword. I have something like this (but that's obviously wrong): WildcardQuery nameQuery = new WildcardQuery(new Term("name", filter.getSearch())); WildcardQuery des...

Symfony and Lucene search

Hello We are using sf 1.4 and doctrine. I installed Lucene according to the Jobeet tutorial. And I've been getting into some problems with it. When I do the search without any values I get the complete table that Lucene is working with. If I do a search of a value that was previously inserted into the table it returns nothing But W...

How to get frequently occuring phrases with Lucene

HI!! as the question says, I would like to get some frequently occuring phrases with lucene. I am getting some information from txt files, and am losing a lot of context for not having information for phrases eg. "information retrieval" is indexed as two separate words. What is the way to get the phrases like this? I can not find anyth...

Searching for multiple words in on field in Lucene index

I'm having problem with Zend_Search_Lucene. I have few documents with field "tags" in index. Documents "tags" have following values: tag1 tag2 tag3 tag1 tag4 I would like to find document only with tag1 AND tag4 so I use query "+tags:tag1 +tags:tag2". I can't figure out why I get 0 hits from index. ...

Indexing and searching French text with diacritics in Lucene

I am using Lucene Search. I have uploaded french file with following content. french.txt multimédia francophone pour l'enseignement du français langue étrangère If I search for francophone then it shows file in search result. Now when I search for multimédia or français or étrangère word it does not show any result. I have tried to ...

Lucene Search with French words

I am using Lucene Search. I have uploaded french file with following french content. french.txt multimédia francophone pour l'enseignement du français langue étrangère If I search for francophone then it shows file in search result. Now when I search for multimédia or français or étrangère word it does not show any result. I have tr...

Unit test for Lucene indices

I'm working on legacy code that builds an index of popular terms in another index. There are no unit tests in place, and the indexing process is a pain to wait for because the first index takes so long to build. I want to structure the second (popular term) index differently. Is there a best practice for testing to see if a Lucene inde...

Encrypted, Compressible, Cross Platform, File system in a file

We wish to make a desktop application that searches a locally packaged text database that will be a few GB in size. We are thinking of using lucene. So basically the user will search for a few words and the local lucene database will give back a result. However, we want to prevent the user from taking a full text dump of the lucene inde...

adding a document to a Lucene index causes crash

i'm trying to index an mp3 file with only one ID3 frame. using CLucene and TagLib. the following code works fine: ... TagLib::MPEG::File file("/home/user/Depeche Mode - Personal Jesus.mp3"); if (file.ID3v2Tag()) { TagLib::ID3v2::FrameList frameList = file.ID3v2Tag()->frameList(); lucene::document::Document *document = new lucene...