tags:

views:

196

answers:

3

I have a String like the following one:

"some value is 25 but must not be bigger then 12"

I want to extract the two numbers from the string.

The numbers are integers.

There might be no text before the first number and some text after the second number.

I tried to do it with a regexp and groups, but failed miserably:

public MessageParser(String message) {
 Pattern stringWith2Numbers = Pattern.compile(".*(\\d?).*(\\d?).*");
 Matcher matcher = stringWith2Numbers.matcher(message);
 if (!matcher.matches()) {
  couldParse = false;
  firstNumber = 0;
  secondNumber = 0;
 } else {
  final String firstNumberString = matcher.group(1);
  firstNumber = Integer.valueOf(firstNumberString);
  final String secondNumberString = matcher.group(2);
  secondNumber = Integer.valueOf(secondNumberString);

  couldParse = true;
 }
}

Any help is apreciated.

+7  A: 

Your pattern should look more like:

Pattern stringWith2Numbers = Pattern.compile("\\D*(\\d+)\\D+(\\d+)\\D*");

You need to accept \\d+ because it can be one or more digits.

jjnguy
As pointed out below, the ".*" "eats up" the entire string...
MartinStettner
Wouldn't \D*(\d+)^\D]*(\d+)^\D* be slightly more appropriate? As we explicitly don't want digits and . has the potential to match a digit before we get to \d.
toast
I hate regular expressions, thanks.
jjnguy
That second `\\D*` should be `\\D+`. As it is, your regex could match the string `"42"`, leaving `"4"` in group #1 and `"2"` in group #2.
Alan Moore
fixed, thanks...again.
jjnguy
+2  A: 

Your regex matches, but everything gets eaten up by your first .* and the rest matches the empty string.

Change your regex to "\\D*(\\d+)\\D+(\\d+)\\D*".

This should be read as: At least one numeric digit followed by at least one character that isn't a numeric digit, followed by at least one numeric digit.

Ben S
The leading and trailing `.*` are necessary if you use the `matches()` method as the OP is doing. Your regex will work with the `find()` method, which performs the more traditional "it's in there somewhere" kind of regex matching.
Alan Moore
Thanks for the clarification Alan, I edited my answer.
Ben S
+2  A: 

Your ".*" patterns are being greedy, as is their wont, and are gobbling up as much as they can -- which is going to be the entire string. So that first ".*" is matching the entire string, rendering the rest moot. Also, your "\\d?" clauses indicate a single digit which happens to be optional, neither of which is what you want.

This is probably more in line with what you're shooting for:

Pattern stringWith2Numbers = Pattern.compile(".*?(\\d+).*?(\\d+).*?");

Of course, since you don't really care about the stuff before or after the numbers, why bother with them?

Pattern stringWith2Numbers = Pattern.compile("(\\d+).*?(\\d+)");

That ought to do the trick.

Edit: Taking time out from writing butt-kickingly awesome comics, Alan Moore pointed out some problems with my solution in the comments. For starters, if you have only a single multi-digit number in the string, my solution gets it wrong. Applying it to "This 123 is a bad string" would cause it to return "12" and "3" when it ought to simply fail. A better regex would stipulate that there MUST be at least one non-digit character separating the two numbers, like so:

Pattern stringWith2Numbers = Pattern.compile("(\\d+)\\D+(\\d+)");

Also, matches() applies the pattern to the entire string, essentially bracketing it in ^ and $; find() would do the trick, but that's not what the OP was using. So sticking with matches(), we'd need to bring back in those "useless" clauses in front of and after the two numbers. (Though having them explicitly match non-digits instead of the wildcard is better form.) So it would look like:

Pattern stringWith2Numbers = Pattern.compile("\\D*(\\d+)\\D+(\\d+)\\D*");

... which, it must be noted, is damn near identical to jjnguy's answer.

BlairHippo
Wouldn't the ".*" between the number patterns "eat up" the second number?
MartinStettner
Nope. The question mark coming after the star indicates it should match the SHORTEST possible string it can -- thus, it'll match everything before the second number.
BlairHippo
... although, because the two digit clauses aren't optional anymore, the question mark is less important -- if you're absolutely certain there will be two and ONLY two numbers in the string, you don't need it. It matters if there are more digits, though. Using "1 and 2 and 3" as a sample string: with the question mark, you get 1 and 2 being fished out. Without, 1 and 3 would be the two extracted values.
BlairHippo
You can avoid that problem by using `\\D*` instead of `.*`, as @jinguy did. But the one in the middle should be `\\D+`, as I explained in the comment to his answer.
Alan Moore
I didn't think of that, but you're right; my solution will be fine if there really are two numbers in the input text, but if the text is "I have a 42 number", it will come back with "4" and "2" when it ought to simply fail. And I didn't even think of the matches/find thing, which has tripped me up in the past. Will edit answer, seeing as it got accepted despite these deficiencies. :-)
BlairHippo