tags:

views:

68

answers:

3

I need help again with some regular expressions I'm trying to do (still under heavy learning).

Again I'm trying to learn by parsing user agents. Trying to do Firefox now...

Take in consideration these UAs:

  • Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.0.15) Gecko/2009101601 Firefox 2.1 (.NET CLR 3.5.30729)
  • Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.20) Gecko/20081217 Firefox(2.0.0.20)
  • Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.1b3;MEGAUPLOAD 1.0 (.NET CLR 3.5.30729)
  • Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a3pre) Gecko/20100306 Firefox3.6 (.NET CLR 3.5.30729)
  • Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081202 Firefox (Debian-2.0.0.19-0etch1)
  • Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)
  • Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.13) Gecko/20080313 Firefox

I'm trying to do a regular expression that will extract the Firefox version out of the UA.

Here are the rules I want:

  1. The version is always after the "firefox" string ("Firefox" can be in any character case).
  2. The version string can start either right after a "/" or a space or with a "(" or even without anything.
  3. The version string ends by a whitespace or the end of the string or by closing parenthesis or a semicolon.
  4. In some rare cases the version isn't provided (see last UA). The regexp must match but return an empty string as version (if possible).

I think thats it. If anyone can help it would be great!

+3  A: 

Something like this should work:

/Firefox[ \(\/]*([a-z0-9\.\-]+)/i
casablanca
Seems to work but added #4 in my original question.
Activist
For #4, turn the + into a *, like this: `/Firefox[ \(\/]*([a-z0-9\.\-]*)/i`
casablanca
It failed #5 for me, matching `Debian-2.0.0.19-0etch1`. Don't know if that is what the OP wants. Mine didn't match it at all :S
alex
Great! Seems to work with my 2000+ FF UAs :)
Activist
@alex: Yes this is what I want (I posted the weirdests UAs because regular one are real easy to parse).
Activist
Okay, good work ! +1
alex
Found another trouble. How can I apply your regex on the last occurence of "firefox" in the string?
Activist
@Activist: The easy way to grab the last occurence is just to add `.*` to the beginning of the regex.
Anon.
A: 

The following would match all of your supplied ones:

#^Mozilla/.*\bFirefox\b#

That means:

"Match any string beginning with "Mozilla/", followed by any characters, followed by "Firefox" as a single word.

The \b word breaks in this context enforce matching on a boundary between a word and non-word character. Dunno if it's necessary, but it will prevent matching on Firefox appearing within another word such as "MyFirefox" or "Firefoxy" for example.

*I like to use hashes instead of slashes around my regular expressions as I often need to match on a slash and it makes me less confused.

thomasrutter
A: 

Depending on what you're trying to achieve, it may be smarter to match on Gecko than on Firefox. It would, however, mean that it would match non-Firefox browsers that are based on Mozilla's Gecko (but this is probably desired behaviour).

#Mozilla/.*\bGecko/\d+#
thomasrutter