lucene

How to apply default sorting in lucene on equal scores?

Good day, If I have for example the documents which have the following fields Person_name - Birthday Jordan - 2009-06-15 Marc - 2009-01-01 Marcos - 2009-01-01 Marcissh_something_something - 2009-06-15 Marcos - 2009-12-31 And upon searching for Person_name:Marc* I got the following scores (scores here are hypothetical) Person_name - ...

Using lucene in Tomcat

Background I am assuming the following code is completely thread safe: // Called from a servlet when a user action results in the index needing to be updated public static void rebuildIndex() { FSDirectory dir = new NIOFSDirectory(new File(Configuration.getAttachmentFolder()), null); IndexWriter w = new IndexWriter(dir, analyzer, Index...

Katta usage examples

Does anybody use Katta with Java? Are any samples avalible? ...

Different lucene search results using different search space size.

I have an application that uses lucene for searching. The search space are in the thousands. Searching against these thousands, I get only a few results, around 20 (which is ok and expected). However, when I reduce my search space to just those 20 entries (i.e. I indexed only those 20 entries and disregard everything else...so that deve...

Is Solr available for .Net?

I want to learn Solr.May i know some good tutorial/links for the same? Is Solr available for .net too? ...

configuring nutch regex-normalize.xml

I am using the Java-based Nutch web-search software. In order to prevent duplicate (url) results from being returned in my search query results, I am trying to remove (a.k.a. normalize) the expressions of 'jsessionid' from the urls being indexed when running the Nutch crawler to index my intranet. However my modifications to $NUTCH_HOME/...

How to configure SOLR to use Levenshtein approximate string matching?

Does Apaches Solr search engine provide approximate string matches, e.g. via Levenshtein algorithm? I'm looking for a way to find customers by last name. But I cannot guarantee the correctness of the names. How can I configure SOLR so that it would find the person "Levenshtein" even if I search for "Levenstein" ? ...

Can Lucene return several search results from a single indexed file?

I am using Lucene to index and search a small number of large documents. Using the demo from the Lucene site I have indexed the documents and am able to search them. However, the search result is not particularly useful as it points to the file of the document. With very large documents this isn't particularly useful. I am wondering if ...

Zend Search Lucene numerical range searches

Hi all, I am having difficulty determining my misunderstanding of how Zend Search Lucene indexes and searches integers in ranges. In the following example, I would expect the output to be 1, however it is always 2 (both results). Any hints would be much appreciated. <?php require_once 'Zend/Loader/Autoloader.php'; $loader = Zend_Load...

What is the best way to search multiple sources simultaneously?

I'm writing a phonebook search, that will query multiple remote sources but I'm wondering how it's best to approach this task. The easiest way to do this is to take the query, start a thread per remote source query (limiting max results to say 10), waiting for the results from all threads and aggregating the list into a total of 10 entr...

what's the best way to search a social network by prioritizing a users relationships first?

I have a social network set up and via an api I want to search the entries. The database of the social network is mysql. I want the search to return results in the following format: Results that match the query AND are friends of the user performing the search should be prioritized over results that simply match the query. So can this...

Lucene seems to be caching search results - why?

In my project we use Lucene 2.4.1 for fulltext search. This is a J2EE project, IndexSearcher is created once. In the background, the index is refreshed every couple of minutes (when the content changes). Users can search the index through a search mechanism on the page. The problem is, the results returned by Lucene seem to be cached so...

Search subset of objects using Compass/Lucene

Hi, I'm using the searchable plugin for Grails (which provides an API for Compass, which is itself an API over Lucene). I have an Order class that I would like to search but, I don't want to search all the instances of Order, just a subset of them. Something like this: // This is a Hibernate/GORM call List<Order> searchableOrders = Cus...

Sort by date in Solr/Lucene performance problems

Hi all, We have set up an Solr index containing 36 million documents (~1K-2K each) and we try to query a maximum of 100 documents matching a single simple keyword. This works pretty fast as we had hoped for. However, if we now add "&sort=createDate+desc" to the query (thus asking for the top 100 'new' documents matching the query) it run...

Grails searchable plugin

Hi, In my Grails app, I'm using the Searchable plugin for searching/indexing. I want to write a Compass/Lucene query that involves multiple domain classes. Within that query when I want to refer to the id of a class, I can't simply use 'id' because all classes have an 'id' property. Currently, I work around this problem by adding the fo...

Correct way to write a Tokenizer in Lucene

Hi, I'm trying to analyze content of a Drupal database for collective intelligence purposes. So far I've been able to work out a simple example that tokenizes the various contents (mainly forum posts) and count tokens after removing stop words. The StandardTokenizer supplied with Lucene should be able to tokenize hostnames and emails b...

Lucene using Snowball and SpellChecker brings back strange values

I am trying to get SpellChecker setup using Lucene.NET, it all works fine other than situations similar to the following: I have text containing satellite in the index, I analyze it using Snowball. I then create a SpellChecker index and get suggestions from it. The suggestion I get returned when passing in "Satalite" is "satellit". I...

Lucene.NET in medium trust

Does know how to make Lucene .NET 2.3.2 run in a medium trust environment? GoDaddy doesn't like it the way it is. ...

What analyzer should I use for a URL in lucene.net?

I'm having problems getting a simple URL to tokenize properly so that you can search it as expected. I'm indexing "http://news.bbc.co.uk/sport1/hi/football/internationals/8196322.stm" with the StandardAnalyzer and it is tokenizing the string as the following (debug output): (http,0,4,type=<ALPHANUM>) (news.bbc.co.uk,7,21,type=<HOST>) (...

get cosine similarity between two documents in lucene

Hi i have built an index in Lucene. I want without specifying a query, just to get a score (cosine similarity or another distance?) between two documents in the index. For example i am getting from previously opened IndexReader ir the documents with ids 2 and 4. Document d1 = ir.document(2); Document d2 = ir.document(4); How can i ge...