tags:

views:

27

answers:

1

I want to use lxml cleaner to get rid of all html, but then a regex to autolink something:

[ABC] -> <a href="bah bah bah">ABC</a>

what is the right way to handle this without xss and such?

+1  A: 

Maybe using markdown with inline HTML disabled would be suitable? The python markdown module is quite mature.

Check out the "safe mode" section in the docs for more info on stripping out inline HTML.

Depending on what you want, something like py-wikimarkup may be more appropriate.

Using a custom regexp is probably not a great idea, because

  • you'll have to explain the rules to people who might already be familiar with markdown/WikiText
  • you'll have to provide a way to escape text, e.g. for people who really want to write [ABC]
  • you'll have to fix any bugs, including security issues
intuited