tags:

views:

70

answers:

4

i need help with an Reg. Ex. i have a long text with many whitespaces and new lines, i need to find and select ALL between 2 strings. example:

iojge test rgej <foo>
ferfe 098n34hjlrej
fefe <end

i want to find all between test and end:

 rgej <foo>
ferfe 098n34hjlrej
fefe <

how can i do this?

+1  A: 

You can use two lookarounds and the /s (single line) modifier, which makes the dot match newlines, to look for everything between your two words:

/(?<=test).*(?=end)/s

To explain:

(?<=    # open a positive lookbehind
  test  # match 'test'
)       # close the lookbehind
.*      # match as many characters as possible (including newlines because of the \s modifier)
(?=     # open a positive lookahead
 end    # match 'end'
)       # close the lookahead

The lookarounds will let you assert that the pattern must be anchored by your two words, but since lookarounds are not capturing, only everything between the words will be returned by preg_match. A lookbehind looks behind the current position to see if the assertion passes; a lookahead looks after the current position.

Since regular expressions are greedy by default, the .* will match as much as it can (so if the ending word appears multiple times, it will match until the last one). If you want to match only until the first time it encounters end, you can make the .* lazy (in other words, it'll match as little as possible that still satisfies the pattern) by changing it to .*? (ie. /(?<=test).*?(?=end)/s).

Daniel Vandersluis
To be on the safe side, I'd make it a reluctant DOT-STAR.
Bart Kiers
@Bart it depends on what the OP wants to capture. I've updated my answer to discuss that though.
Daniel Vandersluis
+4  A: 

You can try

preg_match("/test(.*?)end/s", $yourString, $matches);
print_r($matches);
Colin Hebert
The `m` flag will cause `$` to match the end of the line and `^` match the start of a line: it will *not* let the DOT meta character match line breaks. This is done with the `s` flag.
Bart Kiers
@Bart K. oops, you're right.
Colin Hebert
A classic mistake. :)
Bart Kiers
This will capture `test` and `end`, which doesn't comply with the OP's sample.
Daniel Vandersluis
@Daniel Vandersluis, Check $matches[1]
Colin Hebert
@Colin yeah I know that but the OP might not ;)
Daniel Vandersluis
Yeah, and it is pretty clear that the desired match resides at index 1 after looking at the output `print_r($matches);` produces. I like this one better than the look-around suggestion. The readability of this answer is much better.
Bart Kiers
Wow 7k already... I thought I was fast .
NullUserException
A: 

If you have fixed delimiters, you don’t need regular expressions:

$str = 'iojge test rgej <foo>
ferfe 098n34hjlrej
fefe <end';
$start = 'test';
$end = 'end';
if (($startPos = strpos($str, $start)) !== false && ($endPos = strpos($str, $end, $startPos+=strlen($start))) !== false) {
    // match found
    $match = substr($str, $startPos, $endPos-$startPos);
}
Gumbo
+1  A: 

Alternatively you can also do:

$arr1 = explode("test",$input);
$arr2 = explode("end",$arr1[1]);
$result = $arr2[0];
codaddict
What if there is no `test` in `$input`?
Gumbo
@Gumbo: In that case the `$result` will be empty string. But I think there will be warnings of some invalid index. So you are right there needs to be some error checking.
codaddict