Im using lxml.html.cleaner to clean html from an input text. how can i change \n
to <br />
in lxml.html?
views:
27answers:
1
+1
A:
Fairly easy, slightly hacky way: You could do this as part of a two step process, assuming you have used lxml.html.parse
or whichever method to build DOM.
- iterate through the text and tail attributes of the nodes with string replacements. Look at the
iterdescendants
method, which walks through everything for you. lxml.html.clean
as per normal
A more complex way would be to monkey patch the lxml.html.clean
module. Unlike lots of lxml
, this module is written in Python and is fairly accessible. For example, there is currently a _substitute_whitespace
function.
Tim McNamara
2010-10-14 19:45:06