views:

1168

answers:

3

I'm wanting to match a list of words which is easy enough when those words are truly words. For example /\b (pop|push) \b/gsx when ran against the string

pop gave the door a push but it popped back

will match the words pop and push but not popped.

I need similar functionality for words that contain characters that would normally qualify as word boundaries. So I need /\b (reverse!|push) \b/gsx when ran against the string

push reverse! reverse!push

to only match reverse! and push but not match reverse!push. Obviously this regex isn't going to do that so what do I need to use instead of \b to make my regex smart enough to handle these funky requirements?

A: 

Your first problem is that you need three (possibly four) cases in your alternation, not two.

  • /\breverse!(?:\s|$)/ reverse! by itself
  • /\bpush\b/ push by itself
  • /\breverse!push\b/ together
  • /\bpushreverse!(?:\s|$)/ this is the possible case

Your second problem is that a \b won't match after a "!" because it is not a \w. Here is what Perl 5 has to say about \b, you may want to consult your docs to see if they agree:

A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W". (Within character classes "\b" represents backspace rather than a word boundary, just as it normally does in any double-quoted string.)

So, the regex that you need is something like

/ \b ( reverse!push | reverse! | push ) (?: \s | \b | $ )+ /gx;

I left out the /s because there are not periods in this regex, so treat as single line makes no sense. If /s doesn't mean treat as a single line in your engine you should probably add it back. Also, you should read up on how your engine handles alternation. I know in Perl 5 to get the right behaviour you must arrange the items this way (otherwise reverse! would always win over reverse!push).

Chas. Owens
Read the question again, Chas; the OP *doesn't* want to match "reverse!push".
Alan Moore
+1  A: 

At the end of a word, \b means "the previous character was a word character, and the next character (if there is a next character) is not a word character. You want to drop the first condition because there might be a non-word character at the end of the "word". That leaves you with a negative lookahead:

/\b (reverse!|push) (?!\w)/gx

I'm pretty sure AS3 regexes support lookahead.

Alan Moore
In addition to using (?!\w) as the trailing \b replacement I also used (?<!\w) as the leading \b replacement that way words that start with special characters like $! would be matched.
DL Redden
A: 

You can replace \b by something equivalent, but less strict:

/(?<=\s|^)(reverse!|push)(?=\s|$)/g

This way the limiting factor of the \b (that it can only match before or after an actual \w word character) is removed.

Now white space or the start/end of the string function as valid separators, and the inner expression can be easily built at run-time, from a list of search terms for example.

Tomalak