lucene

What are the advantages and disadvantages of using a search engine as a key value store?

Given a search engine like Lucene and a set of XML documents which need to be fully preserved, what are the advantages and disadvantages of using the search engine as key value store for returning XML doucments given a unique primary key which each document contains? ...

Localsolr wt=json and fl compatible?

We've got Localsolr (2.9.1 lucene-spatial library) running on Solr 1.4 with Tomcat 1.6. Everything's looking good, except for a couple little issues. If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than we'd like). If we specify fl=id and leave o...

use compass rather than hibernate second level cache

looking forward to hear opinion from you all. do you think just use compass and enabled caching in compass is enough and thus no need to use hibernate second level cache? i even heard compass support memcached. in that case no point to use hibernate second level cache do you all use compass and still enable hibernate 2nd level cache on...

lucene index file randomly crash and need to reindex

how you all deal wich such issue of occasionally need to reindex? what recommendation do you suggest to minimize this? ...

is it possible to use lucene(on linux) and asp.net(on windows) at the same time?

Hi , I want to start a new project I need performance as well as a neat and robust GUI about the performance I have around 2 millions documents which I like to index'em by the help of lucene installed on linux due to its performance and security. and about GUI I'd like to have flexible and professional look website and since I'm expe...

Counting number of Regex query matches in Document field

Hi, Using Lucene, I can figure out how to create a document, put values in respected fields and then proceed to use a searcher to search the indexed document for matches. However, I am now more concerned with the number of matches in a particular field of each document. Just knowing there is a match is fine but I would like to know ho...

Hints on implementing XQuery full-text search using Lucene

I've used Lucene on a previous project, so I am somewhat familiar with the API. However, I've never had to do anything "fancy" (where "fancy" means things like using filters, different analyzers, boosting, payloads, etc). I'm about to embark on implementing the full-text search feature of XQuery: http://www.w3.org/TR/xpath-full-text-10...

Do documents in Lucene have to contain the same fields?

I'm considering / working on implementing a search engine for our company's various content types, and am attempting to wrap my head around Lucene (specifically the .net flavor). For the moment, my primary question is whether or not documents one indexes have to contain the same fields. For instance: Document1: Title: "I'm a documen...

Search Lucene with precise edit distances

I would like to search a Lucene index with edit distances. For example, say, there is a document with a field FIRST_NAME; I want all documents with first names that are 1 edit distance away from, say, 'john'. I know that Lucene supports fuzzy searches (FIRST_NAME:john~) and takes a number between 0 and 1 to control the fuzziness. The p...

Prevent KeywordTokenizer from creating multiple key-value pairs

I use the Lucene java QueryParser with KeywordAnalyzer. A query like topic:(hello world) is broken up in to multiple parts by the KeywordTokenizer so the resulting Query object looks like this topic:(hello) topic:(world) i.e. Instead of one, I now have two key-value pairs. I would like the QueryParser to interpret hello world as one valu...

Question on Entity Framework and full-text search

Both Entity Framework and NHibernate are O-R mapping framework. Hibernate can use Lucene as full-text solution. Is there any solution combine Entity framework and Lucene for searching? Where to find out the example/resource for this solution? ...

Wildcards in Lucene

Why does the wildcard query "dog#V*" fail to retrieve a document that contains "dog#VVP"? The following code written in Jython for Lucene 3.0.0 fails to retrieve the indexed document. Am I missing something? analyzer = WhitespaceAnalyzer() directory = FSDirectory.open(java.io.File("testindex")) iwriter = IndexWriter(directory, anal...

Collect all hits for a search in Lucene / Optimization

Summary: I collect the doc ids of all hits for a given search by using a custom Collector (it populates a BitSet with the ids). The searching and getting doc ids are quite fast according to my needs but when it comes to actually fetching the documents from disk, things get very slow. Is there a way to optimize Lucene for faster document ...

Logging Search Keywords in Solr / Lucene

I'm new to Solr and am looking for a way to record searches (or keywords) to a log file or database so that I can then analyse for data visualisation. Can Solr do this already? Is this data accessible via. a Solr query? Thanks. Update 1 I'm starting to think I might need to write my own Solr analyzer? ...

Lucene - Zend_Search_Lucene - how to build an index for "tagged"content

Hello all, I have following problem, I need to build lucene index for articles which are tagged. Here is simplified data structure and lucene proposal: article_id -> unindexed article_title -> UnStored article_content -> UnStored article_tags -> ????? (here is the problem) So article can have multiple tags. Lets say we have an artic...

Solr or Nhibernate Search

Bit confused here, How’s Solr or Solrnet any different from Nhibernate Search? Does Solr offer anything more to Lucene.net that Nhibernate Search? ...

How to use wildchards, fuzzy search with Solr?

I use Solr for searching in my data and I recognized now that some of the solr search query language feature does not word for me. I miss these from the capabilities I have: fuzzy search wildchards * ? - I do not have stemming set up so far, this would be useful temporarily for searching field specification - currently I cannot tell se...

Solr paging performance

I have read (http://old.nabble.com/using-q%3D--,-adding-fq%3D-to26753938.html#a26805204): FWIW: limiting the number of rows per request to 50, but not limiting the start doesn't make much sense -- the same amount of work is needed to handle start=0&rows=5050 and start=5000&rows=50. Than he completes: There are very ...

determine which value produced a hit in SOLR multivalued field type

If I have a multiValued field type of text, and I put values [cat,dog,green,blue] in it. Is there a way to tell when I execute a query against that field for dog, that it was in the 1st element position for that multiValued field? Assumption: client does not have any pre-knowledge of what the field type of the field being queried is. ...

Extend JackRabbit or build up from Lucene?

I've been working on a site idea the general concept is a full text search of documents that also allows user ratings based on these rating I wanted to boost the item's value in the Lucene index. But I'm trying to find if I should extend JackRabbit or just build from the Lucene base. Is there any good way to extend JackRabbit in this way...