Purely anecdotal evidence, but we use a (customised, but not in any relevant way) version of StandardAnalyzer
for our system. Our documents may not only be in different languages to each other, but documents may contain chunks of different languages (for example, imagine an article written in Japanese with comments in English), so language-sniffing is difficult.
The majority of our documents are in English, but significant numbers are in Chinese and Japanese, with a smaller number in French, Spanish, Portuguese and Korean.
End result? We use StandardAnalyzer
, and have very few complaints from people using the system in non-Roman languages about the way our searching works. Our system is somewhat 'enforced' on its users, by the way, so it's not like people are not complaining but moving elsewhere; if they're unhappy, we generally know.
So based on the fact that I'm not swamped with user complaints (very occasional ones, mainly about Chinese, but nothing serious and they're easily explained) it seems to be 'good enough' for many cases.