ansaurus

Question

Answer 1

A:

I think the [.] means a dot, not "any character" ... use this instead:

/rv:.+[\)]?/i

Aziz 2010-06-30 18:51:38

Just tried it and it does not work. Oli one seems OK except for the end of rv: value.

Activist 2010-06-30 19:10:06

Answer 2

+1 A:

Here is my revision to allow the RV sub-string to be anywhere

/rv:[\s]*([^); ]+)/i

() denotes the capture group (ie, what you want to get back from this process)
[^); ] means characters that are not ), *space* or ;
+ means one or more times
* means as many as you like, 0-many.
[\s]* just before the parenthesis basically means we chop off any leading whitespace from the match, essential in this case because we're explicitly saying we break the main match on a space.

So this is looking to capture a string of chars excluding ) one or more chars in length, immediately after rv:.

Your version /rv:[.][\)]?/i looks for a single . then optionally a ).

Oli 2010-06-30 18:52:14

Seems to work in most cases but does it takes in consideration the end of the rv: value (closing parenthesis, semicolon, end of sting or whitespace)?BTW I don't know why someone downvoted you :(

Activist 2010-06-30 19:09:15

I've added catches for whitespace after `rv:` and end-of-phrase catches to get `;`, ` ` and EOS. No, I don't know why somebody voted me down either.

Oli 2010-06-30 19:32:42

Your description of the original regex is not precisely correct; it looks for a single **dot** character. Your revised regex answer does not match the bulleted description.

salathe 2010-06-30 19:47:27

It's now working with my all test cases (I have several hundreds). Tim Pietzcker version woks too /rv:([^;)\s]+)/i any one "better"?

Activist 2010-06-30 19:53:12

@Activist: Tim's more precisely follows your description.

salathe 2010-06-30 19:59:32

Since I'm trying to learn, why Tim's better?

Activist 2010-06-30 20:08:46

Tim's doesn't allow for any space after `rv:`. Eg `rv: 1.9.2.5` wouldn't match. I don't know the likelihood of that ever coming up but there you go.

Oli 2010-06-30 21:12:54

Answer 3

+2 A:

/rv\s*:\s*([^;)\s]+)/i

will match rv, followed by a : (which may be surrounded with whitespace), then a run of characters other than ;, ) and whitespace (including newlines). The match result (after rv:) will be captured in backreference no. 1.

Tim Pietzcker 2010-06-30 19:30:46

It's working with my all test cases (I have several hundreds). Oli version woks too /rv:([^;)\s]+)/i any one "better"?

Activist 2010-06-30 19:52:44

Well, this version also accepts tabs and newlines to end a match, as you specified. Other than that, they are pretty much identical.

Tim Pietzcker 2010-06-30 20:30:18

Activist 2010-07-01 13:52:38

Have edited my answer.

Tim Pietzcker 2010-07-01 14:05:55

OK great I get it now :) Just one last question: why don't you wrap your \s with [] (like Oli)? I guess it's redundant, but why?

Activist 2010-07-01 14:25:02

You need brackets if you want to group *several different* characters into one logical unit. `[abc]` means "one of a, b, or c" - `[a]` is the same as `a`. Sometimes a single-character-class makes sense for readability: `^[ ]*` looks nicer to some people than `^ *`.

Tim Pietzcker 2010-07-01 14:47:14

Answer 4

A:

try this...

$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([^\)]*)/i', $str , $matches);
echo $matches[1];

ToonMariner 2010-06-30 19:34:36

Not working, seems similar Tim Pietzcker but without the ; catch...

Activist 2010-06-30 19:50:42

just tried the very same code on my local dev and the out put is:1.9.2.a1preso it should work fine - perhaps a bit more of your code may help us help you?

ToonMariner 2010-06-30 19:58:50

Yes but the rv: value can also end with a ; and your regexp dont work in these cases (see point #3 in my question).

Activist 2010-06-30 20:04:00

Answer 5

+1 A:

may be :

/rv:([^); \n]+)/i

that means NO ) ; space line-feed one or more time case insensitive and captured

M42 2010-06-30 19:42:46

Answer 6

A:

I think what you want is this:

(?<=rv:).*(?=\))

everything within parentheses is a group. this ?<= is called a positive lookbehind. it basically matches a string before the string you want. this ?= is called a positive lookahead and matches a string after the string you want. since the string you want is simply numbers, letters and a decimal or two, the . operator works as a catchall and matches any character except line breaks. * indicates one or more of the previous characters.

hope that helps

Erik 2010-06-30 20:11:43

Answer 7

A:

$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([a-z0-9\.])*/im', $str , $matches);
echo $matches[1];

ToonMariner 2010-07-01 01:21:04

ansaurus

tags:

views:

answers:

Regular expression help in PHP

related questions