ex: i want to clean the "script" tag , but i want to keep the 'a' tag ,
so what lib you using to do this .
and i use jquery cleditor for WYSIWYG HTML editor , can it do this for me automatically ?
thanks
ex: i want to clean the "script" tag , but i want to keep the 'a' tag ,
so what lib you using to do this .
and i use jquery cleditor for WYSIWYG HTML editor , can it do this for me automatically ?
thanks
I have to do this automatically for a project of mine. The solution I have found is to use the Beautiful Soup module to extract the script tag (I also do this for style and form).
soup = BeautifulSoup(html_string, convertEntities=BeautifulSoup.HTML_ENTITIES)
scripts = soup.findAll('script') # find and return a list of 'script' entities
for s in scripts:
s.extract() # remove it from the DOM completely
Then, you can have BeautifulSoup print out or save the html.
I suppose that BeautifulSoup should do the trick, here.
Actually, here's a question + answers that's exactly about that : Python HTML sanitizer / scrubber / filter
Another option, designed for sanitization, is html5lib.
Whatever you do, do not rely on an editor component to do it for you: That runs on the client, so could easily be manipulated to submit invalid or malicious HTML!