tags:

views:

27

answers:

1

Im using lxml.html.cleaner to clean html from an input text. how can i change \n to <br /> in lxml.html?

+1  A: 

Fairly easy, slightly hacky way: You could do this as part of a two step process, assuming you have used lxml.html.parse or whichever method to build DOM.

  1. iterate through the text and tail attributes of the nodes with string replacements. Look at the iterdescendants method, which walks through everything for you.
  2. lxml.html.clean as per normal

A more complex way would be to monkey patch the lxml.html.clean module. Unlike lots of lxml, this module is written in Python and is fairly accessible. For example, there is currently a _substitute_whitespace function.

Tim McNamara