views:

86

answers:

1

The title says it all.

This question has been asked before:

http://stackoverflow.com/questions/2495997/postgresql-full-text-search-in-postgresql-japanese-chinese-arabic

but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese.

Edit: As we are already successfully using PG's internal FTS engine for English documents, we don't want to move to an external indexing engine. Basically, what I'm looking for is a Chinese FTS configuration, including parser and dictionaries for Simplified Chinese (Mandarin).

A: 

Index your data with Solr, it's an open source enterprise search server built on top of Lucene.

You can find more info on Solr here:

http://lucene.apache.org/solr/

A good book on how-to (with PDF download immediately) here:

https://www.packtpub.com/solr-1-4-enterprise-search-server/book

And be sure to use a Chinese tokenizer, such as solr.ChineseTokenizerFactory because Chinese is not whitespace delimeted.

Chris Adragna
We need to use the FTS engine built into Postgres. We have already successfully implemented English FTS, and want to continue to use the same system for Chinese documents.
Mikey Cee
Oh, I see. Well, then my answer isn't helpful to you. I see your clarification/edit on the question since your original post. I'm not sure what your timeline will accomodate, but the Solr solutions are open source. You *may* be able to borrow from the ChineseTokenizerFactory -- it's logic overcomes the inherent problem as I understand it to be, that the language is not whitespace delimeted. Best of luck to you.
Chris Adragna