I am working on languge segmentation project. I applied language segmentation for English by using regular expression breaking the string at . ("Full Stop"). Now i want to provide the support for following languages (Chinese, Arabic, Japanese, Russian, Korean, Dutch, Hindi, Greek, Urdu). I want to break the above mentioned language strings on Full stop.
e.g.
For Chinese Full stop is 。 (Unicode value U+3002) String
以有效應對各種事態」。他還表示,希望以符合21世紀的方式切實深化美日同盟關係。
Expected Result
Segment 1 :- 以有效應對各種事態」。
Segment 2 :- 他還表示,希望以符合21世紀的方式切實深化美日同盟關係。
Same logic I have to apply for other languages (Arabic, Japanese, Russian, Korean, Dutch, Hindi, Greek, Urdu).
Thanks in Advanced