questions about lucene | ansaurus

lucene

Alternatives to Lucene Default Fuzzy Matching Implementation

Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching. Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identify homphones also. Also please compare various fuzzy matching approaches for lucene. ...

string-matching

Situations to prefer Apache Lucene over Solr?

There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...). Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off when using Solr. Is SolrJ recommended at all? So, when would you recommend to use "pure-Luc...

Spelling correction in Sphinx?

Hello. I was about to integrate the Sphinx-based search into the website, but I've found that there's no built support for spelling correction. Folks on the web suggest using pspell or other third-party libraries to get things done, but the problem is the data I'm going to search in, contains mostly "technical" terms like brand names, t...

full-text-search

splitting lucene index into two halves

what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index ...

Remove results below a certain score threshold in Solr/Lucene?

Hi Guys, Is there a built-in functionalities in solr/lucene to filter the results if they fall below a certain score threshold? Let's say if I provide a score threshold of .2, then all documents with score less than .2 will be removed from my results. My intuition is that this is possible by updating/customizing solr or lucene. Could y...

Highlighter in lucene doesn't return any fragment

I'm trying to implement highlighting in my lucene application and I can't get any fragment. getBestFragment always returns null. My code: QueryParser parser = new QueryParser(Version.LUCENE_30, "text", myAnalyzer); Query realQuery = parser.parse(query); Highlighter highlighter = new Highlighter(new QueryScorer(realQuery, "text")); for...

Indexing and Searching Over Word Level Annotation Layers in Lucene

I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like: Word POS Chunk NER ==== === ===== ...

MySQL Full-Text Search Across Multiple Tables - Quick/Long Solution?

Hello all, I have been doing a bit of research on full-text searches as we realized a series of LIKE statements are terrible. My first find was MySQL full-text searches. I tried to implement this and it worked on one table, failed when I was trying to join multiple tables, and so I consulted stackoverflow's articles (look at the end for...

full-text-search

C# Lucene get all the index

Hello, I am working on a windows application using Lucene. I want to get all the indexed keywords and use them as a source for a auto-suggest on search field. How can I receive all the indexed keywords in Lucene? I am fairly new in C#. Code itself is appreciated. Thanks. ...

performance comparision between Zend Lucene and Java Lucene

Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java. Just wondering How big the performance difference among these two, regarding to index building and data searching? Is it much more effective to let java create and rebuild index, and let php use the index? ...

Hibernate/Lucene/HibernateSearch: find all words that start with specific prefix.

I want to get a list of all words in a database table that start with a specific prefix. I've been looking for a way to query the terms in a Lucene index (I need the terms, I don't care about the documents they are from) but without success. Any ideas? ...

How to configure NGram filter for Compass

I'm trying to add NGram token filter (from Lucene contrib) into my Compass configuration. Following Compass documentation i'm tries with: <searchEngine> <analyzer name="ngram" type="Simple" filters="ngram-filter"/> <analyzerFilter name="ngram-filter" type="org.apache.lucene.analysis.ngram.NGramTokenFilter"> </analyzerFilte...

Wildcard for terms in phrase - Lucene

Google's query syntax allows to search phrases like "as * as a skyscraper" where the asterisk can match any term (or terms). Is there a way to achieve the same thing in Lucene? The proximity operator ~ could be of use but it is not what I exactly want. ...

how to use BaseTokenStreamTestCase classe with maven

I want to unit test my lucene filter (which extends TokenFilter). I use maven. The "BaseTokenStreamTestCase" class looks perfect but I have no idea in which maven artifactId I can find it ? Any idea ? ...

How well does Solr scale over large number of facet values?

I'm using Solr and I want to facet over a field "group". Since "group" is created by users, potentially there can be a huge number of values for "group". Would Solr be able to handle a use case like this? Or is Solr not really appropriate for facet fields with a large number of values? I understand that I can set facet.limit to rest...

full-text-search

Integrate Lucene or any other search product with SQL Server 2005

Hi, I need to use full text search with SQL Server 2005 and I have explored its inbuilt search approach (SQL Server full text indexing) but it seems less powerful. I have also looked features of Lucene. Now my questions: Is is possible to integrate Lucene and SQL server in anyway? Can my T-SQL queries use Lucene index for returning ...

sql-server-2005

full-text-search

full-text-indexing

Solr return whether member is in multivalued field

Is there any way to return in the fields list whether a value exists as one of the values of a multivalued field? E.g., if your schema is <schema> ... <field name="user_name" type="text" indexed="true" stored="true" required="true" /> <field name="follower" type="integer" indexed="true" stored="true" multiValued="true" /> ... </schema...

Can I integrate Solr with Sharepoint with out using Lucene Connector Framework.

Can I integrate Solr with Sharepoint with out using Lucene Connector Framework. if so should I make Solr Index the Sharepoint's underlying database ? Will this produce successful search results ? ...

CF9's Apache Lucene vs SQL Server's full text search?

ColdFusion 9's full text search is now based on Apache Lucene Solr (or Verity, but it has too much limitations). We also use SQL Server. Which one's better? Which one's easier? UPDATE: going to use for... searching against the name & description fields of the Products table. Thanks! ...

How to deal with constantly changing data and SOLR indexes?

Afternoon guys, I'm using a SOLR index for searching through items on my site. The search results contain an average rating of the item and an amount of comments the item has. The results can be sorted by both rating and num of comments. But obviously with the solr index, these numbers aren't updated until the db (2million~ rows) is r...

1
...
30
31
32
33
34
...
48