Some time in the near future I will need to implement a cross-language word count, or if that is not possible, a cross-language character count.
By word count I mean an accurate count of the words contained within the given text, taking the language of the text. The language of the text is set by a user, and will be assumed to be correct.
By character count I mean a count of the "possibly in a word" characters contained within the given text, with the same language information described above.
I would much prefer the former count, but I am aware of the difficulties involved. I am also aware that the latter count is much easier, but very much prefer the former, if at all possible.
I'd love it if I just had to look at English, but I need to consider every language here, Chinese, Korean, English, Arabic, Hindi, and so on.
I would like to know if Stack Overflow has any leads on where to start looking for an existing product / method to do this in PHP, as I am a good lazy programmer*
A simple test showing how str_word_count with set_locale doesn't work, and a function from php.net's str_word_count page.
*http://blogoscoped.com/archive/2005-08-24-n14.html