tags:

views:

274

answers:

3

Hi

I am using a rich html editor and I want to make a whitelist of the stuff that should be allowed in.

I heard that you should use a whitelist instead of black list since it is easier to do then trying to then making a blacklist.

I even seen some examples where people could hide the script tag in a css style part.

So this is a sample of what the editor generates

<span _moz_dirty="" style="font-weight: bold;">
aaaaaaaaaaaa
<br _moz_dirty=""/>
ffffffffffff
<br _moz_dirty=""/>
<span _moz_dirty="" style="text-decoration: underline;">
fffffffff
<br _moz_dirty=""/>
</span>
<span _moz_dirty="" style="text-decoration: line-through;">
aaaaaaaaaa
<br _moz_dirty=""/>
<sub _moz_dirty="">
</sub>
<sup _moz_dirty="">ggg</sup>
<sub _moz_dirty="">
</sub>
</span>
</span>
<ol _moz_dirty="">
<li _moz_dirty="">1333</li>
<li _moz_dirty="">ff</li>
</ol>
<ul _moz_dirty="">
<li _moz_dirty="">ggg</li>
<li _moz_dirty="">ff</li>
</ul>
<div _moz_dirty="" style="margin-left: 40px;">
ffffff
<br _moz_dirty=""/>
</div>
fff
<br _moz_dirty=""/>
<br _moz_dirty=""/>
<a _moz_dirty="" href="http://"&gt;ffff&lt;/a&gt;
<br _moz_dirty="" type="_moz"/>
<span _moz_dirty="" style="font-weight: bold;">
<span _moz_dirty="" style="text-decoration: underline;"/>
</span>

So I guess a my white list would be allows these tags with the right class names

<span>
style - font-weight: bold, text-decoration: underline, margin-left, margin-right
<br />
<a>
<ol>
<ul>
<li>

So I am trying to make a regex that I can pop into my C# code to check for only these tags.

So I tried to start with the style stuff

style="[^font\-style|weight]+\s*:\s*[bold|italic]+\s*;\s*"

but it does not work. I tried to change things around from the sample I gave you but nothing shows up.

+1  A: 

Siilar to this question: http://stackoverflow.com/questions/307013/how-do-i-filter-all-html-tags-except-a-certain-whitelist

David Stratton
+1 looks like a duplicate
RageZ
A: 

escape your backslashes?

johnnycrash
Rather put an @ before the string. Regexes aren't easy to read already; but regexes with escaped backslashes are are pain to read.
Joey
Good point. You should use the @ sign. It's not so fun to read "\\s".
johnnycrash
+1  A: 

You are using square brackets, which create a character class; you should instead use parenthesis to indicate an alternative, i.e.

font-(style|weight)

The + is redundant (you don't want one or more, right?).
I think your regexs should be something like

Regex regex = new Regex(@"font-(style|weight)\s*:\s*(bold|italic)\s*;\s*");

Another thing: '^' indicates beginning of line/string, so you should remove it.

Paolo Tedesco
You regex is spot on. I'm just mincing here. Not that you are capturing, but it is good to keep in mind that () = capturing group. use (?:) for a non capturing group. (?:style|weight) You can also use the implicit groups option.
johnnycrash