tags:

views:

9512

answers:

4

I have several strings in the rough form:

[some text] [some number] [some more text]

I want to extract the text in [some number] using the Java Regex classes.

I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].

EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].

+11  A: 

From memory, the following should work:

Pattern p = Pattern.compile("^[a-zA-Z]+([0-9]+).*");
Matcher m = p.matcher("Testing123Testing");

if (m.find()) {
    System.out.println(m.group(1));
}
Allain Lalonde
Don't forget to reuse Patter object. Compiling of patter take huge amount of time.
Rastislav Komara
Agreed. Usually I'd define the pattern as a private static final Pattern PATTERN = Pattern.compile("..."); But that's just me.
Allain Lalonde
+5  A: 

In Java 1.4 and up:

String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
    String someNumberStr = matcher.group(1);
    // if you need this to be an int:
    int someNumberInt = Integer.parseInt(someNumberStr);
}
Jack Leow
+6  A: 

Allain basically has the java code, so you can use that. However, his expression only matches if you numbers are only preceded by a stream of word characters.

"(\\d+)"

should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.

If you expect it to be offset by spaces, it will make it even more distinct to specify `

"\\s+(\\d+)\\s+"

might be better.

If you need all three parts, this will do:

"([^\\d]+)(\\d+)(.*)"

EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, it it isn't totally ignored.

Axeman
you can test Axemans' hypothesis by running a sample test and checking the performance of his vs. A/J solution.
anjanb
Don't you need to specify the beginning and end of the string. Otherwise things like 124xxx123xxx would be matched even though it doesn't fit into his syntax? Or are ^ and $ implicit?
Allain Lalonde
Allain, yours would fail as well. You and Jack make an assumption that non-digit characters will precede the digits. They either do or they don't. In which case, none of these expressions will parse this line. I repeat that *as specified*, the pattern for the digits is enough.
Axeman
A: 

How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part. I included white spaces and included , as possible separator. I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.

arturo
I think you're trying to do too much. The OP didn't even say whether he was looking for integers or floating-point numbers, much less whether they could have whitespace in them. And, since he hasn't bothered to comment on the other answers in the ensuing 14.5 months, there's no point trying to guess what he intended. But I agree with Axeman that there's no point matching the text preceding and following the number.
Alan Moore