I'm writing a blog app with Django. I want to enable comment writers to use some tags (like <strong>
, a
, et cetera) but disable all others.
In addition, I want to let them put code in <code> tags, and have pygments parse them.
For example, someone might write this comment:
I like this article, but the third code example <em>could have been simpler</em>:
<code lang="c">
#include <stdbool.h>
#include <stdio.h>
int main()
{
printf("Hello World\n");
}
</code>
Problem is, when I parse the comment with BeautifulSoup to strip disallowed HTML tags, it also parses the insides of the <code> blocks, and treats <stdbool.h> and <stdio.h> as if they were HTML tags.
How could I tell BeautifulSoup not to parse the <code> blocks? Maybe there are other HTML parsers better for this job?