lucene

Lucene index with multiple fields of the same nature

Each Lucene doc is a recipe, and each of these recipes have ingredients. Im working towards being able to search the ingredients and give a result that says two ingredients matched out of four. (for example) So how can I add the ingredients to the doc? In solr I can just create multiple fields of and it would save them all, I might be...

BooleanQuery Too Many Clauses in Lucene

For a bit of background to know what i am doing: http://stackoverflow.com/questions/2409870/using-hit-highlighter-in-lucene Now to solve this problem i am setting maxclause count to 50000. it works. can there be any problems by increasing the number of clauses ...

Boost Solr results based on the field that contained the hit

Hi, I was browsing the web looking for a indexing and search framework and stumbled upon Solr. A functionality that we abolutely need is to boost results based on what field contained the hit. A small example: Consider a record like this: <movie> <title>The Dark Knight</title> <alternative_title>Batman Begins 2</alternative_title...

How can I get DocId when adding a document in Lucene index?

I am indexing a row of data from database in Lucene.Net. A row is equivalent of Document. I want to update my database with the DocId, so that I can use the DocId in the results to be able to retrieve rows quickly. I currently first retrive the PK from the result docs which I think should be slower than retriving directly from the data...

Lucene wildcard queries

Hello, I have this question relating to Lucene. I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text "textToLook". I have a Lucene Analyzer with several filters. One of them is lowerCaseFilter, so when I create the index, words will be lowercased. Ima...

Can documents indexed with Solr on JDK6 be retrieved using only lucene api on JDK1.4?

My runtime environment is still on JDK1.4 but I like the Solr features related to how documents are ingested and indexed. Would I be able to index my documents using Solr offline on a recent version of the JDK, copy the index over and use it in my runtime environment with an older version of the JDK? Version wise, Solr 1.4.0 uses Apach...

How do i implement tag searching? with lucene?

I havent used lucene. Last time i ask (many months ago, maybe a year) people suggested lucene. If i shouldnt use lucene what should i use? As am example say there are items tagged like this apples carrots apples carrots apple banana if a user search apples i dont care if there is any preference from 1,2 and 4. However i seen many for...

Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in fields during indexing?

I have a database with a column I wish to index that has comma-delimited names, e.g., User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley" I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this? Did I miss a simple option to set the tokenize delimiter? Do I have to ...

Number of hits per document in Lucene.

I am able to find the total number of hits, but I wan't to find out the number of hits per document. Thanks. ...

Lucene's nested query evaluation regarding negation

Hi, I am adding Apache Lucene support to Querydsl (which offers type-safe queries for Java) and I am having problems understanding how Lucene evaluates queries especially regarding negation in nested queries. For instance the following two queries in my opinion are semantically the same, but only the first one returns results. +year:1...

How to handle search term concatenations in SOLR

We are currently replacing our product search from mysql to a SOLR backend. Our customer often search for terms like 'startrek online', 'starwars', 'redsteel' or even 'grandtheftauto'. Is there a method in SOLR to either expand or spellcheck these searches into syllables eg.'star trek online', 'star wars', 'red steel', 'grand theft auto'...

How to sort by a field that has an alternative value if null in lucene?

Hi folks. I want to sort my lucene(.net) search results by a date field (date1), but if date1 is not set, I'd like to use date2. The traditional sort method is to sort by date1, and then sort the values that are the same by date 2. This would mean that whenever I did fall back to date2, these values would be at the top (or bottom) of th...

Lucene case sensitive & insensitive search

I have a Lucene index which is currently case sensitive. I want to add the option of having a case insensitive search as a fall-back. This means that results that match the case will get more weight and will appear first. For example, if the number of results is limited to 10, and there are 10 matches which match my case, this is enough....

Faceted Search w/Lucene.NET & NHibernate.Search

Hi, Anyone know if it is possible to perform faceted searches with NHibernate.Search and Lucene.NET or do you need to implement something like Solr as well to get this functionality. I haven't been able to find anything regarding this in the docs. Thanks! ...

Lucene - querying with long strings

I have an index, with a field "Affiliation", some example values are: "Stanford University School of Medicine, Palo Alto, CA USA", "Institute of Neurobiology, School of Medicine, Stanford University, Palo Alto, CA", "School of Medicine, Harvard University, Boston MA", "Brigham & Women's, Harvard University School of Medicine, Boston, M...

Intersecting boundaries with lucene

I'm using Lucene, and I'm trying to find a way to index and retrieve documents that have a ranged property. For example I have: Document 1: Price:[30 TO 50] Document 2: Price:[45 TO 60] Document 3: Price:[60 TO 70] And I would like to search for all the documents whose ranges intersect a specific interval, in the above example, if I ...

Recommended way to perform Lucene search without limit

The Lucene documents tell me that "Hits" will be removed from the API in Lucene 3.0. Deprecated. Hits will be removed in Lucene 3.0. Use search(Query, Filter, int) instead. The proposed overload limits the number of documents returned to the value of the int. So my question is: what is the recommended way to perform a search...

Best way to retrieve certain field of all documents returned by a lucen search

Hi, I was wondering what the best way is to retrieve a certain field of all documents returned by a Searcher of Lucene. Background: each document has a date field (written on) and I would like to show a timeline of all found documents, so I need to extract the date (day) field of all the documents I find with the search. I currently r...

Lucene search and underscores

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value" Is there a simple way to escape the underscore (_) character so that it will search for it? EDIT: 4/1/2010 ...

Nutch - how to crawl by small patches?

Hi everyone! I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. Ends only when my HDD is empty. What i need to do: Start to crawl my seeds with possibility to go further on outlinks. Crawl 20000 pages, then index them. C...