Hello!
I'm working on a small Python script to clean up HTML documents. It works by accepting a list of tags to KEEP and then parsing through the HTML code trashing tags that are not in the list I've been using regular expressions to do it and I've been able to match opening tags and self-closing tags but not closing tags. The pattern I've been experimenting with to match closing tags is </(?!a)>
. This seems logical to me so why is not working? The (?!a)
should match on anything that is NOT an anchor tag (not that the "a" is can be anything-- it's just an example).
Edit: AGG! I guess the regex didn't show!