Searching unicode text using regex
Searching a file which is written in Hindi(Devanagri) (UTF-16) gave rise to the following problem. The file contains: त्रास ततत जुग नींद ना हा बु Note that the first char 'त्र' is a multiple code point of त + ् + र Now while searching for 'त' I get 4 matches including the त of the first char. I am using Java. How can I go abo...