tags:

views:

82

answers:

3

I am trying to replace all numbers except for prices (numbers starting with $) with X in a body of text. I've been trying to use a look behind to get the job done, but it doesn't seem to work. Here is what I am using now:

$comments = preg_replace("/(?<!$)([0-9]+)/", "x", $comments);

This ends up just replacing all numbers with X including those preceded by $.

+3  A: 

You need to escape the dollar sign with a backslash, \$, otherwise it is interpreted as end of line/string.

Also, the second parentheses set are entirely unnecessary - you are not using the group you capture.

Oh, and to avoid replacing something like $100 you will need to add 0-9 to your negative lookbehind... since you're doing that, you can simply place the dollar inside the character class and escaping is not required.

So at this point we have:

$comments = preg_replace("/(?<![$0-9])[0-9]+/", "x", $comments);

But apparently "preg_replace does not support repition in a look-behind" - which I'm taking to mean you can't put 0-9 in the lookbehind, so instead placing a word boundary before it.

Also something to avoid is replacing $9.99, so hopefully we can specify \d. in the lookbehind to disallow that.

So your code eventually becomes:

$comments = preg_replace("/(?<!\$|\d\.)\b[0-9]+\b/", "x", $comments);


With all this added complexity, you'll want to create some test cases to make sure that works as intended.

Peter Boughton
That produces the same as before.
Iainzor
Hmmm... yeah, I thought of that, too. The reason I didn't post this is that the code still manages to find other numbers in the posted solution.
Franz
Please post a sample of your input text - there is no reason for it to fail, unless your monetary values are not <dollar><number> format.
Peter Boughton
Make that the posted code.
Franz
That worked out, thank you.
Iainzor
+2  A: 

$ is a special character in regex (signifying the end of the string). You need to escape it: \$

Also as currently formulated, your lookbehind may only prevent it from replacing the first digit in a price; since after the first digit the lookbehind will proceed to match again since you're no longer at the $.

You might want to use something that includes \b (word boundaries) to restrict the beginning and end of the matched digit sequence to only full numbers.

Amber
A: 

This should work also:

/(?<=\s)[0-9]+/
Cadoo
This: `(?<=\s)(?<!\$)` makes no sense. If the first look behind, `(?<=\s)`, is matched, the second, `(?<!\$)`, can't possibly match.
Bart Kiers