views:

252

answers:

3

I've got a string :

$source = '&
<script type="text/javascript">&</script>
&
<script type="text/javascript">&</script>
&';

The desired result is :

&amp;
<script type="text/javascript">&</script>
&amp;
<script type="text/javascript">&</script>
&amp;

I try with :

echo preg_replace("#&(?!amp;)(?!<\/script>)(?![^<]script.*?>)#i",
                  "&amp;", $source);

But I can only replace the first "&" or they are all replaced.

How can I get this result ?

Edit 1 :

Now if I've got a string :

$source = '&
<script type="text/javascript">text&text</script>
&
<script type="text/javascript">&</script>
&';

The desired result is :

&amp;
<script type="text/javascript">text&text</script>
&amp;
<script type="text/javascript">&</script>
&amp;
A: 

Using the g modifier replaces your match globally (every occurence).

echo preg_replace("#&(?!amp;)(?!<\/script>)(?![^<]script.*?>)#ig",
                  "&amp;", $source);
gregseth
Don't work : preg_replace() [function.preg-replace]: Unknown modifier 'g'
Kevin Campion
+1  A: 

Try this

$output = preg_replace("/&(?!amp;)(?!<\/script>)(?![^<]script.*?>)/", "&amp;", $source);
Christian Toma
Kevin Campion
@Kevin - I tried it on my server and it works as you would expect. What version are you using?
Christian Toma
I use Php 5.3.0
Kevin Campion
Oops ! Sorry, you're right !
Kevin Campion
@Kevin - I'm glad it works.
Christian Toma
Kevin Campion
Ok I found the answer for my last comment. It's "/^)(?![^<]script(.*?)>)(?!<\/script>)/"
Kevin Campion
+1  A: 

Stop it with the regexes already. Please. I can't take it anymore. My head hurts, but only because I'm banging it on my desk.

I would suggest using DOMDocument or SimpleXmlElement to parse the string and then loop through each non-script tag to encode each ampersand.

Lucas Oman
You're joking right ? That isn't really cost effective.
Christian Toma
I totally understand what you mean, I plan to use XSLT but for now I'm forced to use this case... sorry for your head ;)
Kevin Campion
@Christina Toma Why not? If it's as small a document as he shows, then it will require minimal processing for parsing. If, however, the string grows (likelihood of which is inversely proportional to how much the dev insists it won't happen), then this solution will scale well. And what dev wants to come in later and maintain that regex?
Lucas Oman
@Lucas - But why not use the regex provided in the accepted answer, which is faster than all the DOMDocument processing ? What would be the advantages of using DOMDocument in your opinion ?
Christian Toma
@Christina Toma I've already listed some good reasons in my previous comment, but here are a couple specific examples: What if he decides, later, that he also wants to escape angled brackets? Or what if he decides he also wants to skip embed tags? In a large application, maintainability and scalability are far more important than negligible performance improvements.
Lucas Oman
@Lucas - You are absolutely right about maintainability and scalability, but speed is also very important in a large application.
Christian Toma