Searching a file which is written in Hindi(Devanagri) (UTF-16) gave rise to the following problem.
The file contains:
त्रास ततत जुग नींद ना हा बु
Note that the first char 'त्र' is a multiple code point of त + ् + र Now while searching for 'त' I get 4 matches including the त of the first char. I am using Java.
How can I go about searching for 'त''s which are not part of multiple code point chars.
Any help will be appreciated. :)