tags:

views:

72

answers:

4
$str = "& &svnips   Â ∴ ≈ osidnviosd & sopinsdo";   
$regex = "/&[^\w;]/";
echo preg_replace($regex, "&", $str);

I'm trying to replace all un-encoded ampersands with encoded ones.
The problem is it's removing the space between & and sopinsdo.

Any idea why?

+2  A: 

You search for 2 Characters ("&" and a character that is NOT (; or \w))and replace it with &

You should replace with & (add a space to the end of replace string)

peter
This has the same issue as unigg's answer below. There are cases where this will cause undesired behaviour.
eldarerathis
A: 

So you don't want the space between & and sopinsdo removed. Just add one

echo preg_replace($regex, "& ", $str);
unigg
ircmaxell
+2  A: 

Why use regex? Why not use htmlspecialchars()?

echo htmlspecialchars($str, ENT_NOQUOTES, 'UTF-8', false);

Note the fourth parameter. It tells it not to double encode anything. So basically, this will turn all < into &lt;, all > into &gt; and all & that are not part of an existing entity into &amp;

But, if you must use regex, you could do:

$regex = '/&([^\w;])/';
echo preg_replace($regex, '&amp;\1', $str);

Basically, it saves the non-word character and then adds it back...

ircmaxell
+1  A: 

This regex does what you're looking for.

preg_replace('/&(?!\w+;)/', '&amp;', $text);

So for a few simple test cases you can get properly escaped HTML:

'& sopinsdo'          -> '&amp; sopinsdo'
'&amp; sopinsdo'      -> '&amp; sopinsdo'
'sopinsdo & foo; bar' -> 'sopinsdo &amp; foo; bar'
'sopinsdo &foo bar'   -> 'sopinsdo &amp;foo bar'
jmz