lucene

How do i keep my DB and lucene in sync?

So i can have a transaction in sql. But i am sure its not a good idea to wait in the middle of a transaction for lucene to finish also i am unsure if lucene is permanently saved in the DB until i do something there. Whats the best way to keep my DB and lucene in sync? I am thinking of adding a lucene_queue in my sql db and everytime i m...

Luke like tool in C#

Is there are Luke like tool for viewing lucene indexes in C# using the Lucene.NET api? ...

Have boost effect on lucene/compass field search.

Hi there, In our compass mapping, we're boosting "better" documents to push them up in the list of search results. Something like this: <boost name="boostFactor" default="1.0"/> <property name="name"><meta-data>name</meta-data></property> While this works fine for fulltext search, it does not when doing a field search, e.g. the boost...

Lucene query: bla~* (match words that start with something fuzzy), how?

In the Lucene query syntax I'd like to combine * and ~ in a valid query similar to: bla~* //invalid query Meaning: Please match words that begin with "bla" or something similar to "bla". ...

Information Retrieval database formats?

I'm looking for some documentation on how Information Retrieval systems (e.g., Lucene) store their indexes for speedy "relevancy" lookups. My Google-fu is failing me: I've found a page which describes Lucene's file format, but it's more focused on how many bits each number is than on how the database is used in producing speedy queries....

How do i delete/update a doc with lucene?

I am creating a tagging system for my site I got the basics of adding a document to lucene but i can seem to figure out how to delete a document or update one when the user changes the tags of something. I found pages that say use the document index and i need to optimize before effect but how do i get the document index? Also i seen an...

How to convert pdf, ppt, xl, doc files to txt/html files... any opensource tools/codes in php/python/perl available?

My end objective is to index documents using lucene. As lucene doesnt support indexing other formats. I want to convert these files to txt/html (lucene indexable file types). I have a set of documents almost 1000 files of ppt, pdf, doc, xl etc Please help me ...

How to sort search results in lucene?

For example, if I have enumeration with "good", "better", "the best" values, I want to sort my search results by field that holds one of this values in string representation. I have few purposes: 1) Create CustomAnalyzer that produces numeric value from enum: good -> 1, better -> 2, the best -> 3 2) Implement FieldComparator (I don't ...

Disabling scoring in Lucene(.NET)

Hi, When searching, is there a way to disable scoring for any query? The scenario is that the user refines his query by trying different combinations of words, phrases etc., and needs realtime (well, reasonably fast at least) responses on the number of hits. Search time slows down a lot when there are millions of hits due to scoring...

How to get a Token from a Lucene TokenStream?

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an Attr...

Indexing large DB's with Lucene/PHP

Afternoon chaps, Trying to index a 1.7million row table with the Zend port of Lucene. On small tests of a few thousand rows its worked perfectly, but as soon as I try and up the rows to a few tens of thousands, it times out. Obviously, I could increase the time php allows the script to run, but seeing as 360 seconds gets me ~10,000 row...

lucene and plurals

If you look at the comment here you'll see Lucene is very much the tool to do this. If you want apple and apples (plural) to match, you just need to be careful about using the correct language stemmer when indexing and querying the index. I'm new to lucene and barley understand how adding and saving document work. How do...

Why wasn't fast-vector-highlighter (lucene-contrib) made an official part of Lucene 3.0 core

I've read some Jira entries and they mentioned moving fast-vector-highlighter to core about a year ago but it never made it. Looking at the svn for contrib it seems incomplete. There are no tests for FastVectorHighlighter Documentation is lacking No samples anywhere on apache.org Anyone have any ideas what its status is? ...

Reverse search in Hibernate Search

I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need i...

Solrnet /ASP.NET sample without MVC

I am trying to get a handle on Solrnet and interacting an ASP.NET site with a Solr server. However, the sample app (on the code repository) is MVC based ,does anyone know of a version in plain vanilla ASP.NET? Thanks ...

Lucene: How to have more than 100 results?

Hi all, my question is simple but i cant fin de answer. Is there a way to set in Lucene to retrieve an amount of results higher than 100 in a query? Im using lucene 2.4.0 now. Thanks all. ...

Using Solr and Zends Lucene port together...

Afternoon chaps, After my adventures with Zend-Lucene-Search, and discovering it isn't all its cracked up to be when indexing large datasets, I've turned to Solr (thanks to Bill Karwin for that :) ) I've got Solr indexing the db far far quicker now, taking just over 8 minutes to index a table of just over 1.7million rows - which I'm v...

Can zend's lucene implementation be configured to use a mysql database instead of the file system?

Is there an option for Zend's lucene implementation (or a third-party plugin) that would allow me to put the lucene dictionary into a [MySQL] database? The reason I need to ask is that the database is the only common resource for our two otherwise independent web servers. ...

I'm trying to do a solr search such that results are shown if a certain field has ANY value

I'm running a search with a type field. I'd like to show results of a certain type ONLY if two other field have values for them. So in my filter query I thought it would be(type:sometype AND field1:* AND field2:*) but wildcard queries can't start with the *. ...

Using FieldSelector when searching with Lucene

I'm searching articles in PubMed via Lucene. Each of the 20,000,000 articles has an abstract with ~250 words and an ID. At the moment I store my searches, with each take multiple seconds, in a TopDocs object. Searchs can find thousands of articles. I'm just interested in the ID of the article. Does Lucene load the abstracts internally i...