I want to use lxml cleaner to get rid of all html, but then a regex to autolink something:
[ABC] -> <a href="bah bah bah">ABC</a>
what is the right way to handle this without xss and such?
I want to use lxml cleaner to get rid of all html, but then a regex to autolink something:
[ABC] -> <a href="bah bah bah">ABC</a>
what is the right way to handle this without xss and such?
Maybe using markdown with inline HTML disabled would be suitable? The python markdown module is quite mature.
Check out the "safe mode" section in the docs for more info on stripping out inline HTML.
Depending on what you want, something like py-wikimarkup may be more appropriate.
Using a custom regexp is probably not a great idea, because
[ABC]