questions about lucene

How to index word with hyphen in Lucene?

Hello I have a StandardAnalyzer working which retrieves words and frequencies from a single document using a TermVectorMapper which is populating a HashMap. But if I use the following text as a field in my document, i.e. addDoc(w, "lucene Lawton-Browne Lucene"); The word frequencies returned in the HashMap are: browne 1 lucene 2 l...

java

lucene

Indexing previous records with Doctrine (and Symfony!) with Zend Lucene

I have a Symfony application that uses Doctrine as its ORM. Based on Symfony's "Practical symfony" book, I have Zend Lucene added to my web app. However, the problem is that there are around 1.1 million rows existing in the database that I want to index for Lucene as well. The only things being indexed are edited rows and the rows have...

Problem with indexing using StreamingUpdateSolrServer in SOLRJ

I just had a miserable failure with SOLRJ. Somehow StreamingUpdateSolrServer failed on some of the items that are being indexed, but others succeeded. It simply throws out an Exception with "Bad Request" message, without any further explanation or stack trace. I suspect that this is due to malformed data, but after double checking, I'm a...

how to delete documents using term in lucene

I am trying to delete a document by using a term in lucene index. but the code that I made below isn't working. are there any suggestion of how can I perform deleting function in lucene index? public class DocumentDelete { public static void main(String[] args) { File indexDir = new File("C:/Users/Raden/Documents/lucene/LuceneHibernate/...

java

lucene

Best way to keep index real time?

Hi All I have solr/lucene index file of say 700GB, now the documents that i need to index are coming in real time say in half an hour 1000 docs are submitted and need to be indexed. now in my scenario an executable run after every 30 mins and index the documents that are not yet indexed, because it is requirement that the new documents ...

lucene

solr

AnnotationSessionFactoryBean requires lucene classes. wtf?

I am trying to add transaction support to an existing webapp via spring transactions. i recently changed my session factory class from LocalSessionFactoryBean to AnnotationSessionFactoryBean. now i get the following error when the webapp starts: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tx...

adding documents to an existing index in lucene

hello all, I would like to ask of how to add new documents to an existing lucene index. in the source code below, I just change the paramater of IndexWriter into false. IndexWriter indexWriter = new IndexWriter( FSDirectory.open(indexDir), new SimpleAnalyzer(), false, IndexWriter.MaxField...

java

lucene

Lucene query praser, only read a certain fields query, behavior changed in 2.9.3

I need my query parser to only read fields that are "text". for example, lets say my query is: text:"this fox" OR title:"brown dog" for highlighting purposes, i need the parser/searcher to only search using the text:"this fox" part. in 2.4 this worked fine, but since upgrading to 2.9.3, something has changed. example code: IndexSearche...

java

lucene

IKVM.NET and Lucene

Hi, I am using Lucene.Net but there are some interesting Java components for Lucene (especially analyzers) that haven't been ported to Lucene.NET yet so maybe IKVM is a better choice. Some research has shown that IKVM seems to work pretty well, but I haven't seen anything regarding Lucene. Does anybody have experience running Lucene wi...

Searching Techniques Recommendations

Hi everyone. This is more of a theory question rather than practice. I'm working on a project which is quite a simple catalog of links. The whole model is similar to the Dmoz or Yahoo catalog, except that each entry has certain additional attributes. I have hierarchical taxonomy working on all entries with many-to-many relationship, all...

Best practice for ensuring Solr/Lucene index is "up to date" after long rebuild

Hi all, We have a general question about best practice/programming during a long index rebuild. This question is not "solr specific" could just as well apply to raw Lucene or any other similar indexing tool/library/black box. The question What is the best practice for ensuring Solr/Lucene index is "absolutely up to date" after long i...

indexing

lucene

solr

create new core directories in SOLR on the fly!

i am using solr 1.4.1 for building a distributed search engine, but i dont want to use only one index file - i want to create new core "index"-directories on the fly in my java code. i found following rest api to create new cores using an EXISTING core directory (http://wiki.apache.org/solr/CoreAdmin). http://localhost:8983/solr/admin/...