I'm using an HTML sanitizing whitelist code found here:
http://refactormycode.com/codes/333-sanitize-html
I needed to add the "font" tag as an additional tag to match, so I tried adding this condition after the <img
tag check
if (tagname.StartsWith("<font"))
{
// detailed <font> tag checking
// Non-escaped expression (for testing in a Regex editor app)
// ^<font(\s*size="\d{1}")?(\s*color="((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)")?(\s*face="(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)")?\s*?>$
if (!IsMatch(tagname, @"<font
(\s*size=""\d{1}"")?
(\s*color=""((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)"")?
(\s*face=""(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)"")?
\s*?>"))
{
html = html.Remove(tag.Index, tag.Length);
}
}
Aside from the condition above, my code is almost identical to the code in the page I linked to. When I try to test this in C#, it throws an exception saying "Not enough )'s
". I've counted the parenthesis several times and I've run the expression through a few online Javascript-based regex testers and none of them seem to tell me of any problems.
Am I missing something in my Regex that is causing a parenthesis to escape? What do I need to do to fix this?
UPDATE
After a lot of trial and error, I remembered that the #
sign is a comment in regexes. The key to fixing this is to escape the #
character. In case anyone else comes across the same problem, I've included my fix (just escaping the #
sign)
if (tagname.StartsWith("<font"))
{
// detailed <font> tag checking
// Non-escaped expression (for testing in a Regex editor app)
// ^<font(\s*size="\d{1}")?(\s*color="((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)")?(\s*face="(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)")?\s*?>$
if (!IsMatch(tagname, @"<font
(\s*size=""\d{1}"")?
(\s*color=""((\#[0-9a-f]{6})|(\#[0-9a-f]{3})|red|green|blue|black|white)"")?
(\s*face=""(Arial|Courier\sNew|Garamond|Georgia|Tahoma|Verdana)"")?
\s*?>"))
{
html = html.Remove(tag.Index, tag.Length);
}
}