lucene

Lucene gotchas with punctuation

Whilst building some unit tests for my Lucene queries I noticed some strange behavior related to punctuation, in particular around parentheses. What are some of the best ways to deal with search fields that contain significant amounts of punctuation? ...

How to re-read files used by ASP.NET MVC web application without apppool recycling?

I'm testing ASP.NET MVC web application and I use Lucene index files. For every test I need to rebuild Lucene index and then to force my web application to re-read these index files. The only way I've found is to recycle application pool, but it's rather slow. Does anyone know a way to re-read files from disc without recycling applicati...

Index strategy for tagged documents where tags can change often

Hi, In addition to text content my documents have tags which can be searched too. The problem now is that the tags change quite often and every time a tag gets added or removed I have to call UpdateDocument which is quite slow when done for hundreds of documents. Are there any well performing strategies for storing tags that change oft...

Display ellipsis before and after fragment in SOLR

I have SOLR configured to return fragments with a fragsize of 500. Sometimes, the whole field is 500 characters or less, so the fragment is identical to the field. For fields that are longer than that, SOLR just returns the fragment without any indication (or so it seems) that the fragment only represents part of the content of a field...

Indexing and searching MySQL with solr

(I have put ' in the XML below to make it display) Hi all I want to index my MySQL db table with solr. I have installed the necessary java components/adaptors etc. My database is called 'test_db' and the table in it is called 'table_tb'. The table contains 2 columns (fields) -Field 1 is called 'ID' and is an autoincremented primary ke...

Indexing and searching MySQL with solr

I have set up Solr and am trying to index a simple 2 column, 2 row table (MySQL 'test_tb' tabe within database 'test_db') with (first column) unique id (in the mysql of type int) and (second column) some text. I keep getting the error: WARNING: Error creating document : SolrInputDocument[{ID_F=ID_F(1.0)={1}}] org.apache.solr.common.Solr...

do I need to rebuild my lucene index for this change?

Do I need to rebuild a Lucene index when I only add a random field to a schema? Or could I run some code to update that field without rebuilding the index? This is the field I need to add: http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html In this case, Lucene is running on Solr. ...

get a date out of lucene

Hi, I have indexed a date in lucene using DateTools.dateToString to store the date in a particular field. Is there any way to know if this was a date field, and more importantly how to get the date out again? It's a fieldable with a long integer value. Thanks ...

Lucene index deleted when opening with Luke/Indexreader

I was creating a lucene index when my indexing program crashed. The indexer had processed about 3M documents before crashing, producing a 14GB file. When I opened the index in Luke (with force unlock), the whole index was gone!. poof. The opened index had 0 documents and its size was reduced to 1kb. Did anyone experience this, or can of...

Lucene keyword notification

What's the best way to get a notification (say an event) when a keyword is found in a document in Lucene? The brute force way is to keep searching for the keyword in short intervals but that seems very inefficient as well as not as "real-time" ...

Apache Solr - are the documents itself stored internally apart from the index?

Hello, I have been trying to research how solr works when documents like doc or pdf are submitted to it. I want to know if I submit pdfs to solr, does it end up storing the pdf file also along with the index generated after parsing the pdf file? Thanks, -Keshav ...

Search for (Very) Approximate Substrings in a Large Database

I am trying to search for long, approximate substrings in a large database. For example, a query could be a 1000 character substring that could differ from the match by a Levenshtein distance of several hundred edits. I have heard that indexed q-grams could do this, but I don't know the implementation details. I have also heard that L...

NullPointerException in solr multicore

Hi! I'm configuring my solr for two cores and have got most of it working, but I'm getting this cryptic error. First off, here's my solr.xml: <?xml version='1.0' encoding='UTF-8'?> <solr persistent="true"> <cores adminPath="/admin/cores"> <core name="cars" dataDir="/var/lib/solr/data/cars" config="/etc/solr/home_cars/conf/solrconfi...

searchable plugin ignores objects id

I am using 0.5.5.1 grails searchable plugin. Search works on most of my objects and fields. However, I have a class with String id and it consists of a Number Dash Number like 1-1, 1-2, .. and so on. I cannot search this object by id. My guess its due to dash in it, it might be ignored by searchable analyzer? Not sure.. Any ideas, sugges...

Calculating similarity between and centroid of Lucene documents

In order to perform a simple clustering algorithm on results that I get from Lucene, I have to calculate Cosine similarity between 2 documents in Lucene, I also need to be able to make a centroid document to represent the centroid of each cluster. All I can think of doing is building my own Vector Space model with tf-idf weighting, usi...

Creating demo UI ontop of Solr

I'm looking into some example UI on top of Solr that show of the functionality available in a demo, like e.g. drill down faceted search. I found Blacklight, which looks intensively interesting. Is there any other software that is worth researching or is Blacklight definitive the way to go? Thanks. ...

PHP find relevance

Hi, Say I have a collection of 100,000 articles across 10 different topics. I don't know which articles actually belong to which topic but I have the entire news article (can analyze them for keywords). I would like to group these articles according to their topics. Any idea how I would do that? Any engine (sphinx, lucene) is ok. ...

Lucene QueryParser interprets 'AND OR' as a command?

I am calling Lucene using the following code (PyLucene, to be precise): analyzer = StandardAnalyzer(Version.LUCENE_30) queryparser = QueryParser(Version.LUCENE_30, "text", analyzer) query = queryparser.parse(queryparser.escape(querytext)) But consider if this is the content of querytext: querytext = "THE FOOD WAS HONESTLY NOT WORTH T...

Zend Lucene MoreLikeThis

I'm using Zend_Search_Lucene for my search engine. Sadly it is missing an implementation of the MorelikeThis methods which can find similar documents in the index. Has anybody come across a decent Zend port of this function? I found a drupal module but have no idea if this can be used with Zend without some serious hacking. ...

How to calculate distance between latitude and longitude with a radius in Zend Lucene ?

Hi, I don't know if it's possible, but I would like to make a search on latitude, longitude, and a radius. I mean i'd like to get the distance between my reference latitude and longitude, compare it with all my stored latitude and longitude and if it's lesser than the radius, take it. Is it possible to do this whith Lucene (only if it's...