tags:

views:

117

answers:

3

Given the following input...

;
; comment
; another comment
;

data
data

I am looking for a regular expression that can be used to strip the blank lines and return only the two lines containing the "data" (but leaving the line breaks intact).

Thanks.

A: 

You can replace ^\s*($|;.*) with an empty string to do that.

Max Shawabkeh
Thanks, but this does not appear to catch the lines where the semi-colon appears alone (tested using http://regexlib.com/RETester.aspx).
Martin Robins
Make sure you check "multiline" in that tester.
Max Shawabkeh
Multiline is enabled but thanks for checking.
Martin Robins
Note that the tester does not show matched empty lines (due to not using <pre> in the result HTML).
Max Shawabkeh
Just tried your option in code...new Regex(@"^\s*($|;.*)", RegexOptions.Multiline)using Replace(input, string.Empty), I still get empty lines where they were in the original and also the comments are being replaced by empty lines. I do not think that the end of line character is being included in the match.
Martin Robins
A: 
"(^;.*$) | (^[\s\t\r\n]*$)"

should match lines starting with a semi colon or empty lines

Rune FS
Thanks but this does not seem to be catching the empty lines (tested using regexlib.com/RETester.aspx).
Martin Robins
@martin corrected as far as I can tell the linked to tester now finds the empty line as well. but it's kind odd to test for \r\n and the end of a line but the tester does not seem to mind :p
Rune FS
+3  A: 

Edit

Wait, I think I understand what you mean: you only want to preserve the line breaks after your "data" lines. If so, try:

(?m)^([ \t]*|;.*)(\r?\n|$)

A small explanation:

(?m)          # enable multi-line option
^             # match the beginning of a line
(             # start capture group 1
  [ \t]*      #   match any character from the set {' ', '\t'} and repeat it zero or more times
  |           #   OR
  ;           #   match the character ';'
  .*          #   match any character except line breaks and repeat it zero or more times
)             # end capture group 1
(             # start capture group 2
  \r?         #   match the character '\r' and match it once or none at all
  \n          #   match the character '\n'
  |           #   OR
  $           #   match the end of a line
)             # end capture group 2
Bart Kiers
Thanks but like some of the other answers, this does not seem to be catching the empty lines (tested using regexlib.com/RETester.aspx). +1 for the explanation though!
Martin Robins
Can you use both the RegEx to remove comments and Trim() to trim the empty lines?
J Angwenyi
Your revised option does exactly what I want.Thanks very much.
Martin Robins
Yeah, I first thought that you wanted to preserve *all* line breaks, but then realised only the ones after the 'data' lines needed to be preserved. You're welcome, of course.
Bart Kiers