Ok, I have a multi-line string I'm trying to do some clean-up on.
Each line may or may not be part of a big block of quoted text. Example:
This line is not quoted.
This part of the line is not quoted “but this is.”
This one is not quoted either.
“This entire line is quoted”
Not quoted.
“This line is quoted
and so is this one
and so is this one.”
This is not quoted “but this is
and so is this.”
I need a RegEx replacement that will un-wrap the hard-wrapped quoted lines, i.e., replace "\r\n" with a space, but only between the curly quotes.
Here's how it should look after replacement:
This line is not quoted.
This part of the line is not quoted “but this is.”
This one is not quoted either.
“This entire line is quoted”
Not quoted.
“This line is quoted and so is this one and so is this one.”
This is not quoted “but this is and so is this.”
(Note how the last two lines were multiple lines in the input text.)
Constraints
- Ideally need a single Regex replace call
- Using .NET RegEx library
- The quotes are always start/end curly quotes, not plain ol' double-ticks ("), which should make this a little easier.
Important Constraint
This is not direct .NET code, I'm populating a table of "searchfor/replacewith" strings that are then called via RegEx.Replace. I don't have the ability to add custom code like Match Evaluators, looping through captured groups, etc.
Current answer so far, something along the lines of:
r.Replace("(?<=“)\r\n(?=”)", " ")
Obviously, I'm not even close yet.
The same logic could be applied to, say, color-coding of block comments in programming code--anything inside the block comment is not treated the same way as the stuff outside the comments. (Code is a little trickier since start/end block comment delimiters can also legitimately exist within a literal string, an issue I don't have to deal with here.)