lucene

Reading from compressed lucene index

I created a lucene index and compressed the index directory with bz2 or zip. I donot want to uncompress it. Is there any API call that can read the index from this zipped directory and thus allow searching and other functionalities. That is, can lucence IndexReader read the index from a compressed file. I saw that Lucnene IndexReader ...

lucene index missing files

I have _0.cfs file of a lucene index directory but segments.gen and segments_2 are missing. Can I generate the segments.gen and segments_2 files without having to regenerate the _0.cfs file. Does these "segments" files contain any index specific data, which will thus force me to regenerate the entire index again. Or can I just generate t...

Is it possible for lucene to store the index only in one file

When we create a lucene index, various files are created. If we do not optimize Index writer three files are created, one named _0.cfs which contains all of the index data and two other files containing meta data. Is it possible to force lucene to create only one file instead of three. ...

Why are my Lucene Document results empty?

I'm running a simple test--trying to index something and then search for it. I index a simple document, but then when a search for a string in it, I get back what looks to be an empty document (it has no fields). Lucene seems to be doing something, because if I search for a word that's not in the document, it returns 0 results. Any reas...

Refining Solr searches, getting exact matches?

Afternoon chaps, Right, I'm constructing a fairly complex (to me anyway) search system for a website using Solr, although this question is quite simple I think... I have two search criteria, location and type. I want to return results that are exact matches to type (letter to letter, no exceptions), and like location. My current sea...

Multiple or single index in Lucene?

I have to index different kinds of data (text documents, forum messages, user profile data, etc) that should be searched together (ie, a single search would return results of the different kinds of data). What are the advantages and disadvantages of having multiple indexes, one for each type of data? And the advantages and disadvantage...

How-to index arrays (tags) in CouchDB using couchdb-lucene

The setup: I have a project that is using CouchDB. The documents will have a field called "tags". This "tags" field is an array of strings (e.g., "tags":["tag1","tag2","etc"]). I am using couchdb-lucene as my search provider. The question: What function can be used to get couchdb-lucene to index the elements of "tags"? If you have...

What is the VInt in Lucene ?

I want to know what is the VInt in Lucene ? I read this article , but i don't understand what is it and where does Lucene use it ? Why Lucene doesn't use simple integer or big integer ? Thanks . ...

"no inclosing instance error " while getting top term frequencies for document from Lucene index

Hello ! I am trying to get the most occurring term frequencies for every particular document in Lucene index. I am trying to set the treshold of top occuring terms that I care about, maybe 20 However, I am getting the "no inclosing instance of type DisplayTermVectors is accessible" when calling Comparator... So to this function I p...

java AbstractMethodError

How to handle this error in lucene: java.lang.AbstractMethodError: org.apache.lucene.store.Directory.listAll()[Ljava/lang/String; at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index....

opening lucene index stored in hdfs

How to read a lucene index directory stored over HDFS i.e. How to get IndexReader for the index stored over HDFS. The IndexReader is to opened in a map task. Something like: IndexReader reader = IndexReader.open("hdfs/path/to/index/directory"); Thanks, Akhil ...

Need help in filtering records based on radius value in solr

Hi, I am using solr with Lucene spatial 2.9.1 as per http://www.ibm.com/developerworks/java/library/j-spatial/ I want to write a query, that will retrieve records within a given radius using hsin function, and using cartesian tiers as filters. So i wrote query like this http://localhost:8983/solr/select/?q=body:engineering colleges^...

How to structure an index for type ahead for extremely large dataset using Lucene or similar?

I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need a...

Lucene document Boosting

Hello, I am having problem with lucene boosting, Iam trying to boost a particular document which matches with the (firstname)field specified I have posted the part of the codeenter code hereprivate static Document createDoc(String lucDescription,String primaryk,String specialString){ Document doc = new Document(); doc.add(new Field...

Best way to reuse a Runnable

I have a class that implements Runnable and am currently using an Executor as my thread pool to run tasks (indexing documents into Lucene). executor.execute(new LuceneDocIndexer(doc, writer)); My issue is that my Runnable class creates many Lucene Field objects and I would rather reuse them then create new ones every call. What's t...

Lucene HTMLFormatter skipping last character

I have this simple Lucene search code (Modified from http://www.lucenetutorial.com/lucene-in-5-minutes.html) class Program { static void Main(string[] args) { StandardAnalyzer analyzer = new StandardAnalyzer(); Directory index = new RAMDirectory(); IndexWriter w = new IndexWriter...

No optimization causes wrong search result

I just took over our solr/lucene stuff from my ex-colleague. But there is a weird bug. If there is no optimization after dataimport, actually if there are multiple segment files, the search result then will be wrong. We are using a customized solr searchComponent. As far as I know about lucene, optimization should not affect search resu...

Scalable Full Text Search With Per User Result Ordering

What options exist for creating a scalable, full text search with results that need to be sorted on a per user basis? This is for PHP/MySQL (Symfony/Doctrine as well, if relevant). In our case, we have a database of workouts that have been performed by users. The workouts that the user has done before should appear at the top of the res...

Building a case for solr

Our product consists of multiple applications, All using Lucene. 2 of the applications I am involved with have Lucene indexes of about 3 GB and 12GB. Another team is building an application, for which they estimate the LUCENE INDEX size to be close to 1 Terabyte. New documents are added to the indexes every 15 days approx. We do not have...

How to search a PDF in Acrobat Reader AND jump to a certain page via parameter?

Hi, we are using lucene within a web application to search in a great number of PDF documents. The workflow is like this: A user enters a search term A list of search results is presented to the user. Each search result represents one PDF document and shows the user on which page the search term was found. Each of these pages is rep...