I have a text file that contains more or less paragraphs. The text is not actually words, its comma delimited data; but that's not really that important. The text file is sort of divided into sections; there can be sections, and subsections. The division of sections is denoted by more than one newlines and subsections by a newline.
So sample data:
This is the, start of a, section
908690,246246246,246246
246246,246,246246
This is, the next, section,
sfhklj,sfhjk,4626246
4yw2,fdhds5juj,53ujj
So the above data contains two sections, each with three subsections. Sometimes however, there is more than one empty line between sections. When this occurs, I want to convert the multiple newline characters, say \n\n\n\n
to just \n\n
; I think regex is probably the way to do this. I also may need to use different newline standards, unix \n
, and windows \r\n
. I think the files probably contain multiple endline encodings.
Here is the regex that I've come up with; its nothing special:
Regex.Replace(input, @"([\r\n|\n]{2,})", Enviroment.NewLine + Enviroment.NewLine}
Firstly, is this a good regex solution? I'm not that good with regex.
Secondly, I then want to split each section into an element in a string array:
Regex.Split(input, Enviroment.NewLine + Enviroment.NewLine)
Is there a way to combine these steps?