tags:

views:

193

answers:

3

I am trying to detect the browser in my C# code by comparing the UserAgent string with some regular expressions.

By the way, in case you are wondering, the reason why I use this approach instead of simply using ASP.NET's HttpBrowserCapabilities object is because I have received a list of more than 200 regex that correspond to 200 browsers and their OS and I can get more detailed information.

Here's a sample:

var sampleUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)";
var ieRegEx = "/^Mozilla\/4.0 \(compatible; MSIE ([0-9\.]+); Windows/si";
var success = Regex.Match(sampleUserAgent, regEx.RegularExpression, RegexOptions.IgnoreCase).Success;

In this example I am expecting the regex match to be successful but for some reason, the match returns false. I am guessing that the regex is not quite right.

I didn't write the regex and I am not very familiar with the RegEx syntax so can anybody help me figure out what is wrong with the regex?

+2  A: 

I know this does not directly answer the question, but I think you are introducing more problems by using regex for this problem. I would stick with HttpBrowserCapabilities instead unless you have a real need do detect more detailed information.

Oren
I agree that it seems to make things more complicated but the reason I am really interested is because I found this web site: http://user-agent-string.info/It has regex string for all known browsers and also all known Operating systems. It's very detailed.
As it goes I would treat BrowserCaps with extreme caution myself, but then I wouldn't trust user-agent either
annakata
@desautelsj but as you note, it has some errors none-the-less...
Oren
@oren the errors, I believe, are due to the fact that the code and regex provided on the web site I mentioned are for PHP and I am trying to convert to C#. I am wondering if the regex "syntax" is slightly different in PHP versus .NET?
Yes. They have different flavors of regex. Regular Expressions Cookbook has a nice intro on this - http://www.amazon.com/Regular-Expressions-Cookbook-Jan-Goyvaerts/dp/0596520689/ref=cm_cr_pr_product_top
Oren
A: 

I found this regex from here

^([^/[:space:]])(/([^[:space:]]))?([[:space:]][[a-zA-Z][a-zA-Z]])? [[:space:]](\((([^()]|(\([^()]\))))\))?[[:space:]]*

which is supposed to match all useragents. They also have a list for more specific useragents.

Joseph
+1  A: 

You should use something like this in C#:

string sampleUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)";
string ieRegEx = @"^Mozilla/4.0 \(compatible; MSIE ([0-9\.]+); Windows .*\)$";
bool success = Regex.Match(sampleUserAgent, ieRegEx, RegexOptions.IgnoreCase).Success;

Note I prefer using correct type identifiers rather than var for simple types.

Also, I personally would just stick to using Browser Caps as it would be much less hassle, especially if you aren't good with reg-ex.

Dan Diplo
Indeed the regex you suggested is successful. You have removed the "/" at the beggining, you have replaced "\/" with "/" and replaced "/si" at the end with ".*\)$" (what does that mean by the way?).I try applying these changes to the 200 or so regex strings and see if that solves the problem for not only the IE regex but also all the others.
^ anchors to start of line and / slashes don't need to be escaped (the \ character is an escape character). The $ matches end of line. The *. matches any sequence of characters. See http://regexlib.com/CheatSheet.aspx
Dan Diplo
thank you for the link to cheatsheet. Very informative! One last question: do have any idea what "\si" in the original regex means? I don't see any mention of it in the cheatsheet. I understand that you replaced it with ".*$" in order to match any number of characters un the end of string but I don't understand the intent of author of the original regex.
They are known as pattern or mode modifiers. The \s means "single line" so that even if the line breaks it still matches and \i means case-insensitive. See http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx http://www.regular-expressions.info/modifiers.html Normally in .net it is easier to use RegexOptions for this, rather than "inline" modifiers.
Dan Diplo