tags:

views:

133

answers:

3

Hi There

I am scraping a year value from the innerhtml of a span and the value is in brackets like this:

<span class="year_type">(2009)</span><br>

I want to get the value of the year without the brackets but am getting some compiler errors when trying to escape the "(" char.

My pattern:

const string yearPattern = "<span class=\"year_type\">\((?<year>.*?)\)</span>";

Complete Code:

const string yearPattern = "<span class=\"year_type\">\((?<year>.*?)\)</span>";
var regex = new Regex(yearPattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
Match match = regex.Match(data);
return match.Groups["year"].Value;

What is the best way to escape the ()

Thanks

+2  A: 

use two slashes.

const string yearPattern = "<span class=\"year_type\">\\((?<year>.*?)\\)</span>"; 

or the @ literal string operator

const string yearPattern = @"<span class=""year_type"">\(?<year>.*?)\)</span>"; 

note; in your original regex you were missing an open-paren.

Cheeso
Your literal string version won't compile. When escaping quotes in a literal string, you need to use "", not \".
rh
got it, fixed. dd
Cheeso
+1  A: 

Prepare to get rocked for parsing HTML with a Regex...

That being said, you just need the @ in front of your pattern definition (or double your escapes \\).

const string yearPattern = @"<span class=""year_type"">\(?<year>.*?)\)</span>";
Austin Salonen
That won't compile. When escaping quotes in a literal string, you need to use "", not \".
rh
if you use @-style string literals, you can't use \" for embedded quotes. Try @"<span class=""year_type"">\(?<year>.*?)\)</span>"
Ben Voigt
+1  A: 

I would consider using a character class for this, e.g. [(] and [)], but using a double-backslash, e.g. \\( and \\) (one \ is for C# and the other one for the regex) is equivalently heavy syntax. So it's a matter of taste.

Romain