tags:

views:

14

answers:

1

Hi,

We need to write Test cases for segmentation Logic for Latin based languages. w.r.t that I have found many sites, doc's for rules of segmentation logic, related to that i got one link "http://www.lisa.org/fileadmin/standards/srx20.pdf" but we don't want to use this. Now we are looking for more relevant segmentation rules which can support all Latin languages and all possibilities like punctuation marks, full stop, comma and other symbols used for Latin based language can be tested using those segmentation rules after segmentation?

Thanks in anticipation, Manjushree

A: 

The best source for generalized segmentation rules for Latin based languages is the Unicode standard annex that deals with text segmentation: http://www.unicode.org/reports/tr29/

Especially see the segmentation rules for sentences in this annex.

Roy Sharon