tags:

views:

113

answers:

4

Hello. I have a text and I need to take the content in a defined pattern. A content between MARK1 and MARK2 and content after MARK2. However, those marks can repeat and I need to take all their ocurrences. In the example below:

text: "textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI"

array(0): _MARK1_ textC _MARK2_ textD 
array(1): textC
array(2): textD
array(3): _MARK1_ textE textF _MARK2_ textG textH textI 
array(4): textE textF
array(5): textG textH textI
A: 

I don't think you'll be able to achieve this with a single expression. Likely you'll need to break it down into an initial expression and then a loop to perform a 2nd expression match against each iteration of the first match.

Isaac Dealey
A: 

Am I missing something or is this what you are looking for?

/(_MARK1_ (.*?) _MARK2 (.*?))*/

I made some arbitrary assumptions about how you want to handle spaces, which I realize were probably only consistent to make your example case more readable.

Sparr
+1  A: 

That would be:

/(_MARK1_(.*?)_MARK2_((?:(?!_MARK1_).)*))/g

At least, it works on RegEx Coach on your test case.
Of course, you need to iterate on each match.
Note it might not work on all flavors of regex: JavaScript, for example, has no lookahead assertions.

PhiLho
perfect. Thats it
Davi Kenji
good catch, excluding _MARK2__MARK1_, I didn't cover that case in my solution
Sparr
A: 

I'm not sure whether you actually need the separating marks in your array. That part seems superfluous unless you have a specific spec for it. This solution assumes you don't really need that. Since you didn't specify a language, how about Perl?

use Data::Dumper;
my $text = 'textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI';
my @results = $text =~ m/(?<=_MARK1_|_MARK2_)(.*?)(?=_MARK1_|_MARK2_|$)/g;
print Data::Dumper::Dumper @results;

However, there's no reason to try the general case with regular expressions. Use a parser instead.