tags:

views:

341

answers:

4

Hi, it’s been a while since the last time I had to use regex, I am kind of in a hurry to accomplish something so I hope I can get a quick answer to this quick question.

Say I have the following text:

Start
A
B
C
End    
Start
A
B
C
End Start
A
B
C
End
Foo
A
B
C
Bar

I would like to replace the line breaks with pipes but only between the "Start" and "End" words so that my end result is:

Start|A|B|C|End    
Start|A|B|C|End Start|A|B|C|End
Foo
A
B
C
Bar

Thank you very much.

+5  A: 

When you start parsing expressions like that, you're not in regex territory anymore. Similar to XML, expressions where you need to treat the same character differently based on its context is a class of language higher than regular expressions.

A more traditional approach of just poking through the string directly would work better in this situation.

Assuming the original string is split up by whitespace as your example showed, you can just split the string on any whitespace, and set a flag when you are between a Start and End token to put pipes between tokens instead of newlines.

Welbog
This does answer the original question but I guess I should have been more specific on my example. The reality is that I need to apply an ad hoc regex only between 2 words. In reality there may not be 2 newline between “End” and “Start” there could be anything between “End” and “Start”, even just a space as in “Start End…….”.
Rene
mmmm…. Not sure what should I do, should I mark this answer as correct (since it was) and open a new question with the right information for my question? Or should I edit the original post and consider this not the right answer (although it really was the right answer at one point)????.
Rene
I think you should mark it as the right answer and ask another question.
SolutionYogi
+1  A: 

regex:

(Start)[\n]*(A)[\n]*(B)[\n]*(C)[\n]*(End)

replace with:

$1|$2|$3|$4|$5

You can put in your own values or even regex for Start, End, A, B, and C. The replace with part may be a little different depending on your language / regex engine, if you tell me what you are using I can be more specific.

David Stalnaker
Yeah, this will work as long as there is a predictable and consistent number of items between Start and End.
Welbog
Right, I'll readily admit that this isn't really a good application for regex. There's probably a way to do it for an arbitrary number of items, but that's reaching the limits of regex for sure.
David Stalnaker
+1  A: 

This works for the case you've provided. No guarantees it will work for anything more complex.

class Program
{
    static void Main(string[] args)
    {
        string s = "Start" + Environment.NewLine +
                    "A" + Environment.NewLine +
                    "B" + Environment.NewLine +
                    "C" + Environment.NewLine +
                    "End" + Environment.NewLine +
                    "Start" + Environment.NewLine +
                    "A" + Environment.NewLine +
                    "B" + Environment.NewLine +
                    "C" + Environment.NewLine +
                    "End Start" + Environment.NewLine +
                    "A" + Environment.NewLine +
                    "B" + Environment.NewLine +
                    "C" + Environment.NewLine +
                    "End" + Environment.NewLine +
                    "Foo" + Environment.NewLine +
                    "A" + Environment.NewLine +
                    "B" + Environment.NewLine +
                    "C" + Environment.NewLine +
                    "Bar";

        Regex regex = new Regex(@"Start(\r\n[^\r\n(End)]*)*End", RegexOptions.Multiline);
        string replaced = regex.Replace(s, AddPipes);
        Console.WriteLine(replaced);
        Console.ReadLine();
    }

    static string AddPipes(Match m)
    {
        string x = m.ToString();
        return x.Replace("\r\n", "|");
    }
}
Austin Salonen
A: 

I agree with Welbog, I think you might be asking too much of regex' in this situation. I would recommend a 2 pass approach. According to RegexBuddy, the following will match your target paragraphs of arbitary length

(Start)(\r\n)((.*)\2)+?(End)

I would use the above regex to pull out the matching paragraphs in your text and then use simple regex or string replace function to exchange the the Cr Lf chars to pipes.

Barry Carr