That worked! At Rubular I had to change the options from /xs
to /m
(and I removed the whitespace that separates the two parts of the regex as you showed it above).
You can see this regular expression in action along with a sample string at http://www.rubular.com/regexes/5855.
In case that Rubular permalink isn't really permanent, here is what I entered for the regular expression:
/&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+);)(?!(?>(?:(?!<!\[CDATA\[|\]\]>).)*)\]\]>)/m
And here is the test string:
<p>a & b</p>
<p>c & d</p>
<script type="text/javascript">
// <![CDATA[
if (a && b) doSomething('a & b & c');
// ]]>
</script>
<p>a & b</p>
<p>c & d</p>
Only two ampersands match -- the a & b
at the top and the a & b
at the bottom. Ampersands already escaped as &
and all ampersands (escaped or not) between <![CDATA[
and ]]>
are left alone.
So, my final code is now this:
html.gsub(/&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+);)(?!(?>(?:(?!<!\[CDATA\[|\]\]>).)*)\]\]>)/m, '&')
Thank you very much Alan. This is exactly what I needed.