tags:

views:

51

answers:

2

An example:

<p> Hello</p>
<div>hgello</div>
<pre>
   code
    code
<pre>

turns in something like:

<p> Hello</p><div>hgello</div><pre>
    code
     code
<pre>

How to do this in python? I make also intensive use of < pre> tags so substituting all '\n' with '' is not an option.

What's the best way to do that?

+2  A: 

You could use re.sub(">\s*<","><","[here your html string]").

Maybe string.replace(">\n",">"), i.e. look for an enclosing bracket and a newline and remove the newline.

phimuemue
+1  A: 

I would choose to use the python regex:

string.replace(">\s+<","><")

Where the '\s' finds any whitespace character and the '+' after it shows it matches one or more whitespace characters. This removes the possibility of the replace replacing

<pre>
    code
     code
<pre>

with

<pre><pre>

More information about regular expressions can be found here, here and here.

Kyra
Just a nitpick: I think `string.replace` doesn't take regular expressions, does it?
phimuemue
Oops. Thanks @phimuemue. Instead use re.sub(regex, replacement, subject) where regex is the ">\s+<" and replacement is the "><"
Kyra