views:

132

answers:

2

I have a problem trying to transform a given input string to a given output string using regular expressions in Javascript. I'm not even sure if what I'm trying to accomplish can be done with regular expressions, or would be most efficient using some other means. I'm hoping someone can help:

I have the following input string:

#> Some text goes here, and a 'quoted string' is inside.
<# something with 'quotes' #>
Another 'quoted string' is found <#

I need to replace each quote ' character with an escaped version \' whenever it is found between a #> and <# sequence.

Desired output string:

#> Some text goes here, and a \'quoted string\' is inside.
<# something with 'quotes' #>
Another \'quoted string\' is found <#

Note that the quotes in the <# something with 'quotes' #> portion were not escaped, only the quotes found between #> and <#.

I'm using the following code to do accomplish this, but I'd like to find a better or more efficient way to do the same thing (NOTE: carriage returns and tabs are guaranteed to not be found in my input string, so I'm safe to use them in the manner below):

var s = ... some input string ...;

// Replace all "<#" sequences with tabs "\t"
s = s.split("<#").join("\t"); 

var i = 1;
do
{
    // Replace a single quote that is found within
    // #> and <# block with a carriage return.
    s = s.replace(/((^|#>)[^\t]*?)'/g, "$1\r");

    // Continue replacing single quotes while we're
    // still finding matches.
    s = s.split("\r");
    if (s.length < ++i)
        break;
    s = s.join("\r");
}
while (true);

// Replace each instance of a carriage return
// with an escaped single quote.
s = s.join("\\'");

The main reason I'm not using just a single regular expression is that I can't seem to get it to replace more than 1 single quote character. So I've resorted to a do/while loop to ensure all of them are escaped.

Does someone have a better way (please)?

+4  A: 

This regex matches the quotes not inside <# ... #>

'(?=((?!#>)[\s\S])*(<#|$))

a short explanation:

'             # match a single quote
(?=           # start positive look ahead
  (           #   start capture group 1
    (?!       #     start negative look ahead
      #       #       match the character '#'
      >       #       match the character '>'
    )         #     end negative look ahead
    [\s\S]    #     match any character from the set {'0x00'..'ÿ'}
  )*          #   end capture group 1 and repeat it zero or more times
  (           #   start capture group 2
    <         #     match the character '<'
    #         #     match the character '#'
    |         #     OR
    $         #     match the end of the input
  )           #   end capture group 2
)             # end positive look ahead

or, in plain English:

Match a single quote only when looking ahead the substring '<#' (or the end of the input) can be seen, without encountering '#>' between the single quote and '<#' (or the end of the input).

But this regex solution will not be more efficient than what you have now (efficient as in: runs faster).

Why are you looking for something other than your current approach? Your solution looks good to me.

Bart Kiers
I don't really care for the split()s and join()s on the '\r' character, and I'm really not sure what the performance would be like on a fairly large input string (10,000 chars or so).Thank you for the in-depth explanation -- extremely helpful for someone like me who only dabbles with regex every now and again :)
Doug
No problem Doug. Note that the short explanation might still be a bit cryptic (it's only a generated explanation...). In case you need it, I posted a (hopefully) more understandable explanation in plain English. If you don't need it, well, perhaps someone else might benefit from it! :)
Bart Kiers
Much appreciated :) The only issue I've found is when there is no #> on the left side (it's implicit) or no <# on the right side (also implicit). For example, "$('#<#= something('') #>').func();".replace(/'(?=((?!#>)[\s\S])*<#)/g, "\\'"); misses the final quote. Thank you very much for your regex, however. It's very insightful and I haven't worked much with positive/negative lookaheads.
Doug
Doug, see my edit: I changed `<#` into `(<#|$)` and adjusted the explanation(s).
Bart Kiers
You are seriously my hero - wish I could buy you some lunch or something :)
Doug
A: 

The following regex works very fast in the firebug console for thousands of chars.

str.replace(/'|\\'/g, "\\'")
   .replace(/(<#[^#\>]*)\\'([^\\']+)\\'([^#\>]*#\>)/g, "$1'$2'$3")

The first replaces all quotes and already escaped quotes by \' The second looks for all the <#...\'...\'...#> and replaces it by <#...'...'...#>

Mic