Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching.
Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identify homphones also. Also please compare various fuzzy matching approaches for lucene.
...
There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...).
Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off when using Solr. Is SolrJ recommended at all?
So, when would you recommend to use "pure-Luc...
Hello. I was about to integrate the Sphinx-based search into the website, but I've found that there's no built support for spelling correction.
Folks on the web suggest using pspell or other third-party libraries to get things done, but the problem is the data I'm going to search in, contains mostly "technical" terms like brand names, t...
what is the best way to split an existing Lucene index into two halves i.e. each split should contain half of the total number of documents in the original index
...
Hi Guys,
Is there a built-in functionalities in solr/lucene to filter the results if they fall below a certain score threshold? Let's say if I provide a score threshold of .2, then all documents with score less than .2 will be removed from my results. My intuition is that this is possible by updating/customizing solr or lucene.
Could y...
I'm trying to implement highlighting in my lucene application and I can't get any fragment. getBestFragment always returns null.
My code:
QueryParser parser = new QueryParser(Version.LUCENE_30, "text", myAnalyzer);
Query realQuery = parser.parse(query);
Highlighter highlighter = new Highlighter(new QueryScorer(realQuery, "text"));
for...
I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like:
Word POS Chunk NER
==== === ===== ...
Hello all,
I have been doing a bit of research on full-text searches as we realized a series of LIKE statements are terrible. My first find was MySQL full-text searches. I tried to implement this and it worked on one table, failed when I was trying to join multiple tables, and so I consulted stackoverflow's articles (look at the end for...
Hello, I am working on a windows application using Lucene. I want to get all the indexed keywords and use them as a source for a auto-suggest on search field. How can I receive all the indexed keywords in Lucene? I am fairly new in C#. Code itself is appreciated. Thanks.
...
Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java.
Just wondering How big the performance difference among these two, regarding to index building and data searching?
Is it much more effective to let java create and rebuild index, and let php use the index?
...
I want to get a list of all words in a database table that start with a specific prefix. I've been looking for a way to query the terms in a Lucene index (I need the terms, I don't care about the documents they are from) but without success.
Any ideas?
...
I'm trying to add NGram token filter (from Lucene contrib) into my Compass configuration. Following Compass documentation i'm tries with:
<searchEngine>
<analyzer name="ngram" type="Simple" filters="ngram-filter"/>
<analyzerFilter name="ngram-filter" type="org.apache.lucene.analysis.ngram.NGramTokenFilter">
</analyzerFilte...
Google's query syntax allows to search phrases like "as * as a skyscraper" where the asterisk can match any term (or terms). Is there a way to achieve the same thing in Lucene? The proximity operator ~ could be of use but it is not what I exactly want.
...
I want to unit test my lucene filter (which extends TokenFilter).
I use maven.
The "BaseTokenStreamTestCase" class looks perfect but I have no idea in which maven artifactId I can find it ?
Any idea ?
...
I'm using Solr and I want to facet over a field "group".
Since "group" is created by users, potentially there can be a huge number of values for "group".
Would Solr be able to handle a use case like this? Or is Solr not really appropriate for facet fields with a large number of values?
I understand that I can set facet.limit to rest...
Hi,
I need to use full text search with SQL Server 2005 and I have explored its inbuilt search approach (SQL Server full text indexing) but it seems less powerful.
I have also looked features of Lucene.
Now my questions: Is is possible to integrate Lucene and SQL server in anyway?
Can my T-SQL queries use Lucene index for returning ...
Is there any way to return in the fields list whether a value exists as one of the values of a multivalued field?
E.g., if your schema is
<schema>
...
<field name="user_name" type="text" indexed="true" stored="true" required="true" />
<field name="follower" type="integer" indexed="true" stored="true" multiValued="true" />
...
</schema...
Can I integrate Solr with Sharepoint with out using Lucene Connector Framework.
if so should I make Solr Index the Sharepoint's underlying database ? Will this produce successful search results ?
...
ColdFusion 9's full text search is now based on Apache Lucene Solr (or Verity, but it has too much limitations). We also use SQL Server.
Which one's better? Which one's easier?
UPDATE: going to use for... searching against the name & description fields of the Products table.
Thanks!
...
Afternoon guys,
I'm using a SOLR index for searching through items on my site. The search results contain an average rating of the item and an amount of comments the item has. The results can be sorted by both rating and num of comments.
But obviously with the solr index, these numbers aren't updated until the db (2million~ rows) is r...