tags:

views:

477

answers:

5

In PHP, I want to encode ampersands that have not already been encoded. I came up with this regex

/&(?=[^a])/

It seems to work good so far, but seeing as how I'm not much of a regex expert, I am asking if any potential pitfalls can be seen in this regex?

Essentially it needs to convert & to & but leave the & in & as is (so as not to get &)

Thanks

Update

Thanks for the answers. It seems I wasn't thinking broadly enough to cover all bases. This seems like a common pitfall of regexs themselves (having to think of all possibilities which may make your regex get false positives). It sure does beat my original one str_replace(' & ', ' & ', $string); :)

+2  A: 

It will apply it for any other encoded char.

eglasius
Can't believe I overlooked this...
alex
+7  A: 
Paul Dixon
Brilliant answer Paul!
alex
A: 

What happens when you have other entities in your document? What happens with if you're talking about a q&a session?

I'd isolate the ampersand rather than guess at context, and then use backreferences in your replacement string

/(\W)&(\W)/$1&$2/
Alan Storm
A: 

That would fail in a case where the character 'a' follows an ampersand but wasn't "amp;" like &and &also &apple...

&(?!amp;)

joshperry
+2  A: 

If your PHP version is >= 5.2.3 you could use the fourth parameter of the htmlspecialchars function. When set to false it will not convert existing entities.

Ionuț G. Stan
Thank you, but at the moment I just want to encode ampersands. But your link is very useful! +1
alex
+1 yes, I didn't know about that either, will mention in my answer
Paul Dixon