views:

46

answers:

3

ex: i want to clean the "script" tag , but i want to keep the 'a' tag ,

so what lib you using to do this .

and i use jquery cleditor for WYSIWYG HTML editor , can it do this for me automatically ?

thanks

+3  A: 

I have to do this automatically for a project of mine. The solution I have found is to use the Beautiful Soup module to extract the script tag (I also do this for style and form).

soup = BeautifulSoup(html_string, convertEntities=BeautifulSoup.HTML_ENTITIES)

scripts = soup.findAll('script')   # find and return a list of 'script' entities
for s in scripts:
    s.extract()   # remove it from the DOM completely

Then, you can have BeautifulSoup print out or save the html.

orangeoctopus
+2  A: 

I suppose that BeautifulSoup should do the trick, here.

Actually, here's a question + answers that's exactly about that : Python HTML sanitizer / scrubber / filter

Pascal MARTIN
A: 

Another option, designed for sanitization, is html5lib.

Whatever you do, do not rely on an editor component to do it for you: That runs on the client, so could easily be manipulated to submit invalid or malicious HTML!

Nick Johnson