Indexing token bigrams in Lucene

views:

582

answers:

+2 Q:

Indexing token bigrams in Lucene

Hi, I need to index bi-grams of words (tokens) in Lucene. I can produce n-grams and than index them, but I am wondering if there is something in Lucene which will do this for me. I found out that Lucene indexes only n-gram of chars. Any ideas?

Depending on why you need to index bi-grams, SpanQuery and/or SnowballAnalyzer may be helpful.

Hank Gay 2009-03-17 13:14:59

+1 A:

Use the NGramTokenizer:

http://lucene.apache.org/java/2_3_2/api/contrib-analyzers/org/apache/lucene/analysis/ngram/NGramTokenizer.html

bajafresh4life 2009-03-17 13:24:39

related questions

Lucene.Net Search result to highlight search keywords

Does a pom.xml.template tell me everything I need to know to use the project as a dependency

Can someone compare a Fuzzy Query to a LuceneDictionary solution?

Has anyone used lucene.net with Linq-to-Entities?

Can someone give me a high overview of how lucene.net works?

Using Lucene to count results in categories

Which search technology to use with ASP.NET?

How to do query auto-completion/suggestions in Lucene?

Should an index be optimised after incremental indexes in Lucene?

What is the best search approach using Lucene?

How to best search against a DB with Lucene?

Is there a fast, accurate Highlighter for Lucene?

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

How do I estimate the size of a Lucene index?

Analyzer for Russian language in Lucene and Lucene.Net

In Lucene how do terms get used in calculating scores, can I override it with a CustomScoreQuery?

Troubleshoot Java Lucene ignoring Field

Best full text search alternative to ms sql, c++ solution

Strategies for keeping a Lucene Index up to date with domain model changes

How to get facet ranges in solr results?

Using Lucene to search for email addresses

WildcardQuery error in Solr

With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

Lucene exact ordering

Lucene Score results

ansaurus

tags:

views:

answers:

Indexing token bigrams in Lucene

related questions