tags:

views:

87

answers:

4

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:

  • 12341+1
  • 12241+1R1
  • 100001+1R2

So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:

^(\d+)(1\\+1).*

This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:

  • 12341+1 becomes 12341
  • 12241+1R1 becomes 12241
  • 100001+1R2 becomes 100001

Any ideas?

A: 

Your regex looks fine to me - I don't have access to java but in JavaScript the code..

"12341+1".replace(/(\d+)(1\+1)/g, "$1");

Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.

"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1"); 

gives 1234 5432.

Ben Clayton
+3  A: 

Your existing regex works fine, just that you are missing a \ before \d

String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");

Working link

codaddict
A: 

Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)

stringName.substring(0, stringName.indexOf("1+1")) 

But it looks like other posters have already mentioned the non-greedy operator.

In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Cpfohl
It's true that `\d+` will *initially* consume everything before the `+`, but when the next part fails, `\d+` will give up the `1` (i.e., backtrack) so the overall match can succeed. Making the `+` reluctant won't do any harm, but in this case it won't do any noticeable good either.
Alan Moore
Hm, that's right. Great point. I think ultimately I still think using a regex for this is overkill though. Using "indexOf" is still reasonably fast (for really small strings like those it wouldn't really matter anyway). I'll edit my post to use real java though... :)
Cpfohl
A: 

IMHO, the regex is correct.

Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".

Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").

andcoz