lucene

lucene : how to get a line of occurence of query

Hi all, I have a number of text files. Each text files have data like this : <text> Big data... big data... </text> <text> another big data </text> <text> some other data </text> now I have to write a code with lucene that could retrieve the entire line when a search query matches, like if i search for some data the entire third line...

lucene delete record, depreciated?

when doing research on deleting documents in lucene i have been shown to use the IndexReaders delete() method, passing in the document id. Now that I actually need to do this, it appears that lucene currently does not support this method, and i have had very little luck in finding the current way to do this. any ideas? ...

Lucene/Compass pagination, going past the "last page"

What happens when you search, using the CompassSearchHelper, and you search past the last page? As it stands, I have an application which searches in groups of 10, and for results where there are only (for example) 2 pages (10 on the first, 0 < x < 10 on the second), clicking "next" again will send me to pages seemingly random results, ...

How to install for Solr 1.4 ( or 1.4.1 ) Extended Dismax ( edismax) plugin and how to configure it ?

Hello everybody, Im using Solr1.4 , with dismax SearchHandler , im new in solr ;), it seems not supporting lucene synthax , it does not even match lowercase uppercase terms ( if you know how to do this it will be helpfull ). i want to try the edismax , ( Extended Dismax ) with solr 1.4 or 1.4.1 , i found it in solr4.0 dev version , the...

How to use an analyzer in compass-lucene search.

How do I add compass analyzer while indexing and searching data in compass.I am using schema based configuration for compass.I want to use StandardAnalyzer with no stopwords.Because I want to index data as it is,without ignoring search terms like AND , OR , IN . The default analyzer will ignore AND , OR , IN from the data I give for inde...

Storing data in Lucene or database

Hello I'm a Lucene newbie and am thinking of using it to index the words in the title and description elements of RSS feeds so that I can record counts of the most popular words in the feeds. Various search options are needed, some will have keywords entered manually by users, whereas in other cases popular terms would be generated au...

search with a combination of structured criteria and freetext keyword/phrase - NOSQL vs Lucene/Sphinx

Hi all, we have a eMall application based mainly around a ~500k rows MySQL master table (with detail tables storing non searchable fields and other related tables with shop info etc). Users can today search based on specific structured product data (e.g. brand, category, price, specific shop etc). We would also like to support keyword...

Lucene .NET fixed filenames in index directory possible?

When building a Lucene .NET index it creates several randomly named files under the root index directory. My question is, is there a way to have these files have a static or fixed name and just overwrite upon re-index, or all be in one file? ...

Calculate the score only based on the documents have more occurance of term in lucene

Hi, I am started working on resume retrieval(document) component based on lucene.net engine. It works great, and it fetches the document and score it based on the the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collectio...

ASP.NET search indexing building strategy

This is what I'm planning to do and I'd appreciate anyone's input: I've built a forum in Asp.net MVC and now want to add Lucene.Net for search. My plan is to run an index builder thread every 5-10 minutes to update the search index with the changes made to the each discussion. The way it will work is I keep the date and time for the l...

How to index forum discussions for search?

For a discussion forum, does it work better to index each entry inside a discussion thread as a separate lucene document or simple concat all entries within a discussion into one big block of text and index a whole discussion thread as a single lucene document? ...

Lucene.NET result using sql like LIMIT? + 1k question

I agree with this answer and like it but i really would rather have this solved before going live. So i am starting a bounty in hopes my ass isnt bitten later ;). With Lucene.NET 2.9.x any version using .NET. How might i search and limit/page the results similar to the limit keyword in SQLite and MySql? I'd like to find the top 20 doc...

Lucene indexing with for structured document where each text line has meta-data

Hi I have a document structure where each text line in the document has some meta-data associated with it. The search result must show the line and the meta-data for the line. Currently I am storing each such line as a Lucene documents and storing the metata-data as one of the non-indexed fields. That is I create and add a Lucene Docu...

Lucene security search asp.net c#

Hi, Im hoping this would be a really easy question for someone.... Basically we are indexing security information against my documents in lucene.net, the information is stored in 2 document fields called viewuserids and viewroleids, so when we construct a query - only documents which the user has view access to are returned. The requir...

Lucene wildcard matching fails on chemical notations(?)

Hi All, Using Hibernate Search Annotations (mostly just @Field(index = Index.TOKENIZED)) I've indexed a number of fields related to a persisted class of mine called Compound. I've setup text search over all the indexed fields, using the MultiFieldQueryParser, which has so far worked fine. Among the fields indexed and searchable is a fi...

Lucene Numeric range query doesn't return all of the hits I expect.

I have a set of documents that all have a "timestamp" field which is stored as a long integer number. The field is indexed in my Lucene index as a number using NumericField with a precision step of 8: NumericField("timestamp", 8). This is done so I can do numeric range queries to retrieve all documents that fall within a specific time ...

Is there any Lucene highlighter that does not require the original text - but rather can work on term positions etc

I have been reading the new 2nd edition of the Lucene in Action and they give an example of doing highlighting but unfortunately it requires the original text so it can get the position of terms etc. The highlighter is the official one in contrib, so that implies its the sponsorted or official highlighter. Does anyone know of another hi...

Lucene SpanNearQuery

Hi All, I am trying to understand Lucene SpanNearQuery and wrote up a dummy example. I am looking for "not" followed by "fox" within 5 of each other. I would expect document 3 to be returned as the only hit. However, I end up getting no hits. Any thoughts on what might I be doing wrong will be appreciated. Here is the code: //indexing...

Indexing n-word expressions as a single term in Lucene

I want to index a "compound word" like "New York" as a single term in Lucene not like "new", "york". In such a way that if someone searches for "new place", documents containing "new york" won't match. I think this is not the case for N-grams (actually NGramTokenizer), because I won't index just any n-gram, I want to index only some spe...

Lucene shows strange unsubmitted documents

Hi, I submit a bunch of documents to a newly created index and commit/optimize & close the writer. When I open and read from the index while in the same VM everything works as expected. As soon as I close the VM, restart and read the index in a new application instance, I get a multitude of documents. When I inspect the index via luke ...