so at the end the end(after few days of debuging) i found a problem. It isnt in regex at all :/ . It seams that i was trimming ekstra white spaces with
intput= Regex.Replace(input, "\\s+", " ");
so all new lines are replaced with " ". Stupid! Moderator, please remove this if unnecesary!
I have regexp for tokenizing some text and it looks like this :
"(?<html>Ç)|
(?<number>\\d+(?:[.]\\d+)?(?=[][ \f\n\r\t\v!?.,():;\"'„Ç]|$))|
(?<other>(?:[^][Ç \f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü][^ Ç\f\n\r\t\vA-Za-zčćšđžČĆŠĐŽäöÖü]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü](?=[][!?.,():;\"'„]*(?:$|[ Ç\f\n\r\t\v])))|
(?<word>(?:[^][ Ç\f\n\r\t\v!?.,():;\"'„][^ Ç\f\n\r\t\v]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„])|
(?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„])"
Problem is in this part: (?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„])
. So when im prsing text with input "\n\n"
it is grouping in punctuation matches: " "," "
- in other words, space and space... and I don't know why?