views:

57

answers:

1

Let's say allowed_bits = ['a', 'p']

re.compile(r'<(%s)[^>]*(/>|.*?</\1>)' % ('|'.join(allowed_bits)))

matches:

<a href="blah blah">blah</a>
<p />

and not:

<html>blah blah blah</html>

What I want to do is turn it on its head, so that it matches

<html>blah blah</html>
<script type="text/javascript">blah blah</script>

and not:

<p>Hello</p>

My thinking was to do something like:

re.compile(r'<(**^**%s)[^>]*(/>|.*?</\1>)' % ('|'.join(allowed_bits)))

but this doesn't work.

Any ideas? I want to negatively match.

+2  A: 

Use a negative lookahead assertion (?! … ):

re.compile(r'<(?!%s)[^>](/>|.?)' % ('|'.join(allowed_bits)))
Gumbo