views:

44

answers:

3

I am trying to right a preg_match_all to match horse race distance.

My source lists races as: xmxfxy I want to match the m value, the f value, the y value. However different races will maybe only have m, or f, or y, or two of them or even all three.

// e.g. $raw = 5f213y;

preg_match_all('/(\d{1,})m|(\d{1,})f|(\d{1,})y/', $raw, $distance);

The above sort of works, but for some reason the matches appear in unpredictable positions in the returned array. I guess it is because it is running the match 3 times for each OR. How do I match all three (that may or may not exist) in a single run.

EDIT A full sample string is:

Hardings Catering Services Handicap (Div I) Cl6 5f213y
A: 

you can use "?" as a conditional

preg_match_all('/((\d{1,})m)?|((\d{1,})f)?|((\d{1,})y)?/', $raw, $distance);
krico
i edited the question to show a full sample of the string i am running the regex against. the result of your regex is the following: http://pastebin.com/fSMP5UFc
esryl
@krico: FYI, `?` (as you're using it) is a *quantifier*, not a conditional. All you've done is make all three components optional. As @esryl demonstrated, it's now matching the empty string at almost every position in the subject string.
Alan Moore
A: 

If I understand what you're asking correctly, you would like to get each number from these values separately? This works for me:

$input = "Hardings Catering Services Handicap (Div I) Cl6 5f213y";

preg_match_all('/((\d+)(m|f|y))/', $input, $matches);

After the preg_match_all() executes, $matches[2] holds an array of the numbers that matched (in this case, $matches[2][0] is 5 and $matches[2][1] is 213.

If all three values exist, m will be in $matches[2][0], f in $matches[2][1], and y in $matches[2][2]. If any values are missing, the next value gets bumped up a spot. It may also come in handy that $matches[3] will hold an array of the corresponding letter matched on, so if you need to check whether it was an m, f, or y, you can.

If this isn't what you're after, please provide an example of the output you would like to see for this or another sample input.

JGB146
+1  A: 

If I understand you correctly, you're processing listings (like the one in your question) one at a time. If that's the case, you should be using preg_match, not preg_match_all, and the regex should match the whole "distance" code, not individual components of it. Try this:

preg_match('#\b(?:(?<M>\d+)m|(?<F>\d+)f|(?<Y>\d+)y){1,3}\b#',
           $raw, $distance);

The results are now stored in a one-dimensional array, but you don't need to worry about the group numbers anyway; you can access them by name instead (e.g., $distance['M'], $distance['F'], $distance['Y']).

Note that, while this regex matches codes with one, two, or three components, it doesn't require the letters to be unique. There's nothing to stop it from matching something like 1m2m3m (a weakness shared by your own approach, by the way).

Alan Moore
@Alan Moore: absolute perfection. love that. brilliant.
esryl