tags:

views:

40

answers:

2

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack

add +

or

add (+)

and the needle

+

the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?

Using

add (plus)

as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.

A: 

Try changing it to:

/\b\s?+/gi

Edit:

Extend this concept as far as you want. If you want the first + after any word boundary:

/\b[^+]*+/gi
Stargazer712
That works in the specific example I gave, but doesn't account for word boundaries properly. For example, it doesn't work on 'add (+)' as the haystack.
dosboy
I edited my answer to account for a more general case, but if that's not what you're looking for, then you need to be more specific on what you want.
Stargazer712
Not sure how to be more specific. I need to use the word boundary \b to specify that the special character is at the beginning of a word. I updated the question to include 'add (+)' as an example, but there are obviously dozens more where some character (other than whitespace) designates a word boundary.
dosboy
Then I'm sorry, but I can't help you figure out what you need.
Stargazer712
+1  A: 

\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:

add +

...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.

Alan Moore
That makes a lot of sense actually. I take it then that there's no way to achieve what I want to without creating a list of pre-defined characters (such as `\s`, `\(`, `\[`, etc.) to match against before the `+`?
dosboy
Do you want to match anything that's *not* a word character? That would be `\W` (capital 'w'). Or you can use `\B` to assert that the `+` is not preceded by a word character.
Alan Moore
Definitely not *not* a word character, but your explanation of the actual usage of `\b` gave me a kick in the right direction. I'm now doing a JS check on the first character of my regex. If it's a `\\` I don't append the `\b`. If it isn't, I do. Seem to be getting the results I wanted. Thanks.
dosboy