Hi folks,
I have a set of questions, of which I do not have an answer to.
1) Stripping lists of string
input:
'item1, item2, \t\t\t item3, \n\n\n \t, item4, , , item5, '
output:
['item1', 'item2', 'item3', 'item4', 'item5']
Anything more efficient than doing the following?
[x.strip() for x in l.split(',') if x.strip()]
2) Cleaning/Sanitizing HTML
keeping basic tags e.g. strong, p, br, ...
removing malicious javascript, css and divs
3) Unicode handling...
what would you recommend for dealing with unicode parsed within documents?
Any ideas? :) Thanks guys!