views:

75

answers:

3

Given the following text:

This is!!xa simple string!xpattern

I would like to get a regexp that matches the !x that's between "string" and "pattern" but not !!xa that's between "is" and "a".

This regexp is to be used inside a string split().

I have tried several combinations but I cannot get a regexp that meets my needs. Perhaps my expression is not so regular after all =)

Thanks in advance!


EDIT:

SOLUTION

Just to state clear the solution is going to be:

s.replace(/(([^!])|^)!x/g,'$1SOME_MAGICAL_STRING').split(/SOME_MAGICAL_STRING/)

Thanks for the solution idea to both jvenema and Amarghosh. And also to everyone that provided feedback too.

A: 

This expression should do the job. It uses a negative look behind assertion to assert that there is only a single exclamation mark.

(?<!!)!x
Daniel Brückner
I've never seen that expression before, what does `(?<expr)` mean? I've tried your regexp in regexpal.com but doesn't work, perhaps that web it's not properly coded. I'll try it on my code.
mpeterson
Javascript doesn't support negative look-behinds (or look-behinds at all)
Peter Bailey
It's a lookbehind expression, like `(?!` is lookahead. However you don't get lookbehind in JavaScript's RegExp, and even lookahead has serious problems in IE.
bobince
@mpeterson it's `(?<!REGEX)` syntax for negative look-behind http://www.regular-expressions.info/lookaround.html unfortunately javascript doesn't support look-behind
Amarghosh
Oh I see, thanks for that. Too bad it isn't supported by javascript though.
mpeterson
(?<EXPR) just checks if text to the left of the current position matches EXPR - if not, it fails. (?=EXPR) looks to the right of the current position and adding an exclamation mark inverts the result of the test in both cases. An assertion never consumes any characters.
Daniel Brückner
Something learned - did not know that JavaScript does not support it.
Daniel Brückner
A: 
var s = "This is!!xa simple string!xpattern";
s.replace(/[^!]!x/,'-');

output:

"This is!!xa simple strin-pattern"

Edit: I missed the g, my bad. This one works:

var s = "!xThis is!!xa simple string!xpattern";
s.replace(/(([^!])|^)!x/g,'$1-');

output:

"-This is!!xa simple string-pattern"

All we're doing is matching the preceding character and then including it back into the replacement.

jvenema
Won't this replace `g!x` with `-`?
Joel Potter
Pretty sure he wants to not match the "g".
JAB
Has the same problem than the solution from BalusC.
mpeterson
@jvenema yes, that works for a replace because I can use backreferences but not for a split(). Saying that I could do what Amarghosh suggested and replace it with a magical string and then split it.
mpeterson
Just to state clear the solution is going to be:`s.replace(/(([^!])|^)!x/g,'$1SOME_MAGICAL_STRING').split(/SOME_MAGICAL_STRING/)`Thanks for the solution to both jvenema and Amarghosh
mpeterson
A: 

Too bad JS doesn't have lookbehind :) Assuming no !x!x, You can use RegExp.exec instead of String.split, as in

rx = /((?:[^!]|![^x])+)(?:!x|$)/g
res = []
while ((m = rx.exec("This is!!xa simple string!xpattern")))
  res.push(m[1]);

Here, (?:[^!]|![^x])+ matches one or more non-exclamation point, or a ! not followed by an x. This latter case gets rid of the !!. The (?:!x|$) consumes the !x terminator.


Edit: Since !x!x can happen, the loop has to be modified a bit to avoid infinite loop.

rx = /((?:[^!]|![^x])*)(?:!x|$)/g
res = []
str = "This is!!xa simpl!!xe!x!x string!xpattern"
while (true) {
  var m = rx.exec(str);
  if (m.index >= str.length)
    break;
  res.push(m[1]);
}
res
KennyTM
I could have two consecutive `!x` in my pattern.
mpeterson
@mpeterson: See update.
KennyTM
The solution found proves to be much simpler but great idea though.
mpeterson