I am writing regular expressions for unicode text in Java. However for the particular script that I am using - Devanagari (0900 - 097F) there is a problem with word boundaries. \b matches characters which are dependent vowels(like 093E-094C) as they are treated like space characters.
Example: Suppose I have the string: "कमल कमाल कम्हल कम्हाल" Note that 'मा' in the 2nd word is formed by combining म and ा (recognized as a space character). Similarly in the last word. This leads \b to match the 'ल' in 'कमाल' with regular expression \b\w\b which is not correct according to the language.
I hope the example helps.
Can I write a regular expression that behaves like \b except that it doesn't match certain chars? Any feedback will be grateful.