views:

95

answers:

7

I'm currently trying to learn regular expressions with some simple "real world" examples.

Take in consideration the following string:

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko

I want to find the RV value (1.9.2a1pre). I need to apply the following rules:

  1. RV: can be in any case (RV, rv, rV, Rv...).
  2. RV: can be anywhere in the string.
  3. The RV: value ends with either a closing parenthesis, any whitespace (including linebreak), a semicolon or the end of string.

So far I did:

/rv:[.][\)]?/i

but it's not working (I must be far from the "true" solution)...

The expression must work with PHP preg_match.

A: 

I think the [.] means a dot, not "any character" ... use this instead:

/rv:.+[\)]?/i
Aziz
Just tried it and it does not work. Oli one seems OK except for the end of rv: value.
Activist
+1  A: 

Here is my revision to allow the RV sub-string to be anywhere

/rv:[\s]*([^); ]+)/i
  • () denotes the capture group (ie, what you want to get back from this process)
  • [^); ] means characters that are not ), *space* or ;
  • + means one or more times
  • * means as many as you like, 0-many.
  • [\s]* just before the parenthesis basically means we chop off any leading whitespace from the match, essential in this case because we're explicitly saying we break the main match on a space.

So this is looking to capture a string of chars excluding ) one or more chars in length, immediately after rv:.

Your version /rv:[.][\)]?/i looks for a single . then optionally a ).

Oli
Seems to work in most cases but does it takes in consideration the end of the rv: value (closing parenthesis, semicolon, end of sting or whitespace)?BTW I don't know why someone downvoted you :(
Activist
I've added catches for whitespace after `rv:` and end-of-phrase catches to get `;`, ` ` and EOS. No, I don't know why somebody voted me down either.
Oli
Your description of the original regex is not precisely correct; it looks for a single **dot** character. Your revised regex answer does not match the bulleted description.
salathe
It's now working with my all test cases (I have several hundreds). Tim Pietzcker version woks too /rv:([^;)\s]+)/i any one "better"?
Activist
@Activist: Tim's more precisely follows your description.
salathe
Since I'm trying to learn, why Tim's better?
Activist
Tim's doesn't allow for any space after `rv:`. Eg `rv: 1.9.2.5` wouldn't match. I don't know the likelihood of that ever coming up but there you go.
Oli
+2  A: 
/rv\s*:\s*([^;)\s]+)/i

will match rv, followed by a : (which may be surrounded with whitespace), then a run of characters other than ;, ) and whitespace (including newlines). The match result (after rv:) will be captured in backreference no. 1.

Tim Pietzcker
It's working with my all test cases (I have several hundreds). Oli version woks too /rv:([^;)\s]+)/i any one "better"?
Activist
Well, this version also accepts tabs and newlines to end a match, as you specified. Other than that, they are pretty much identical.
Tim Pietzcker
Activist
Have edited my answer.
Tim Pietzcker
OK great I get it now :) Just one last question: why don't you wrap your \s with [] (like Oli)? I guess it's redundant, but why?
Activist
You need brackets if you want to group *several different* characters into one logical unit. `[abc]` means "one of a, b, or c" - `[a]` is the same as `a`. Sometimes a single-character-class makes sense for readability: `^[ ]*` looks nicer to some people than `^ *`.
Tim Pietzcker
A: 

try this...

$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([^\)]*)/i', $str , $matches);
echo $matches[1];
ToonMariner
Not working, seems similar Tim Pietzcker but without the ; catch...
Activist
just tried the very same code on my local dev and the out put is:1.9.2.a1preso it should work fine - perhaps a bit more of your code may help us help you?
ToonMariner
Yes but the rv: value can also end with a ; and your regexp dont work in these cases (see point #3 in my question).
Activist
+1  A: 

may be :

/rv:([^); \n]+)/i

that means NO ) ; space line-feed one or more time case insensitive and captured

M42
A: 

I think what you want is this:

(?<=rv:).*(?=\))

everything within parentheses is a group. this ?<= is called a positive lookbehind. it basically matches a string before the string you want. this ?= is called a positive lookahead and matches a string after the string you want. since the string you want is simply numbers, letters and a decimal or two, the . operator works as a catchall and matches any character except line breaks. * indicates one or more of the previous characters.

hope that helps

Erik
A: 
$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([a-z0-9\.])*/im', $str , $matches);
echo $matches[1];
ToonMariner