tags:

views:

197

answers:

3

My string looks like so

February 2009

bla bla

March 2009

doo daa bla lbla

Septemer 2009

So I wrote this regex to split it up into months (which is what I want to do first, I think)

$regex = '/(.*)\s(\d){4}/i';

This matches them perfectly, except it throws away the actual string they were split on .. i.e. I want that information (as in February 2009, March 2009 etc)

I've tried mucking around with the preg_split() flags, but could not get what I wanted.

Should I be using a different approach? Is there an easy to split text via a regex but keep the text that was actually there?

Come to think of it, I could probably use `preg_match_all()' here... I hope I just didn't answer my own question in the answer - I'm going to post anyway to see what the community thinks.

Thanks

+2  A: 

Put the splitting string into its own capture group. So given your example,

$regex = '/(.*)\s(\d){4}/i';

with a few modifications becomes:

$regex = '/(.+?)(\s)(\d{4})/i';

If your matches array is called "$matches", $matches[0] will contain the whole match, $matches[1] the month, $matches[2] the splitting string, and $matches[3] the year.

BipedalShark
A: 

it looks that it works even without the non greedy modifier '?'

preg_match('/(.*)\s(\d{4})/', "Month 2009", $a);

(i wonder why as (.*) should match the whole string, isn't it ?)

gpilotino
Initially, `(.*)` consumes the whole rest of the string, but that leaves nothing for the rest of the regex to match. So it starts backtracking--"giving back" the most recently matched characters--until it becomes possible for `\s(\d{4})` to match (or until it gets back to the starting point and gives up). If a regex with a greedy quantifier matches a given string, the same regex with a non-greedy quantifier is guaranteed to match it, too (and vice-versa).
Alan Moore
+2  A: 

preg_split's 4th option is the flags:

http://www.php.net/preg-split

PREG_SPLIT_DELIM_CAPTURE If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.

$a = preg_split('/(.*\s\d{4})/i', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($a);

prints

Array
(
    [0] => 

    [1] => February 2009
    [2] => 
bla bla

    [3] => March 2009
    [4] => 
doo daa bla lbla

    [5] => Septemer 2009
    [6] => 

)

So thats pretty close.

Justin