Ligatures are the Unicode characters which are represented by more than one code points. For example, in Devanagari त्र
is a ligature which consists of code points त + ् + र
.
When seen in simple text file editors like Notepad, त्र
is shown as त् + र
and is stored as three Unicode characters. However when the same file is opened in Firefox, it is shown as a proper ligature.
So my question is, how to detect such ligatures programmatically while reading the file from my code. Since Firefox does it, there must exist a way to do it programmatically. Are there any Unicode properties which contain this information or do I need to have a map to all such ligatures?
SVG CSS property text-rendering
when set to optimizeLegibility
does the same thing (combine code points into proper ligature).
PS: I am using Java.
EDIT
The purpose of my code is to count the characters in the Unicode text assuming a ligature to be a single character. So I need a way to collapse multiple code points into a single ligature.