Hi folks,
I am working on one feature i.e. to apply language segmentation rules ( grammatical ) for Latin based language ( English currently ).
Currently I am in phase of breaking sentences of user input.
e.g.:
"I am working in language translation". "I have used Google MT API for this"
In above example i will break above sentence by full stop (.) This is normal cases where I am breaking sentence on dot, but there are n number of characters for breaking sentence like ( . ! ? etc ).
I have following SRX rules for segmentation.
Here my question are :-
1) Is there any reference ? which I can use for resolving my language segmentation rules.
2) Or Is there any forums on language segmentation ? , so that i can discuss efficiently
Please let me know if anybody know about this ?
Thanks a lot.