Indexing multi-lingual content with Lucene.net | ansaurus

tags:

views:

387

answers:

1

Q:

Indexing multi-lingual content with Lucene.net

I use Lucene.net for indexing content & documents etc.. on websites. The index is very simple and has this format:

LuceneId - unique id for Lucene (TypeId + ItemId)
TypeId   - the type of text (eg. page content, product, public doc etc..)
ItemId   - the web page id, document id etc..
Text     - the text indexed
Title    - web page title, document name etc.. to display with the search results

I've got these options to adapt it to serve multi-lingual content:

Create a separate index for each language. E.g. Lucene-enGB, Lucene-frFR etc..
Keep the one index and add an additional 'language' field to it to filter the results.

Which is the best option - or is there another? I've not used multiple indexes before so I'm leaning toward the second.

+1 A:

I do [2], but one problem I have is that I cannot use different analyzers depending on the language. I've combined the stopwords of the languages I want, but I lose the capability of more advanced stuff that the analyzer will offer such as stemming etc.

cherouvim 2009-03-03 17:02:23

related questions

Best way to search data stored as XML in Sql Server?

What are the alternative's to using the iThenticate service for content comparison?

Search by hash?

Free text search integrated with code coverage

How-to: Ranking Search Results

Find item in WPF ComboBox

Find in Files: Search all code in Team Foundation Server

Searching for phone numbers in mysql

How do I implement Search Functionality in a website?

Can you perform an AND search of keywords using FREETEXT() on SQL Server 2005?

How do I search content, within audio files/streams?

Search Plugin for Safari

Search strategies in ORMs

Using Lucene to search for email addresses

SQL Server Full Text Searching

How do you do a case insensitive search using a pattern modifier using less ?

WildcardQuery error in Solr

PowerShell FINDSTR eqivalent?

Parsing search queries in Java

Need Pattern for dynamic search of multiple sql tables

grep a file, but show several surrounding lines?

Eclipse : Class file name must end with .class exception in Java Search

MOSS SSP problem - Failed database logons from deleted SSP

Incomplete results with Turkish characters in Indexing Service

Lucene Score results