tags:

views:

159

answers:

3

I have a string that looks like "A=1.23;B=2.345;C=3.567"

I am only interested in "C=3.567"

what i have so far is:

     Matcher m = Pattern.compile("C=\\d+.\\d+").matcher("A=1.23;B=2.345;C=3.567");

    while(m.find()){ 
        double d = Double.parseDouble(m.group());
        System.out.println(d);
    }

the problem is it shows the 3 as seperate from the 567

output:

3.0

567.0

i am wondering how i can include the decimal so it outputs "3.567"

EDIT: i would also like to match C if it does not have a decimal point: so i would like to capture 3567 as well as 3.567

since the C= is built into the pattern as well, how can i strip it out before parsing the double?

+1  A: 

To match any sequence of digits and dots you can change the regular expression to this:

"(?<=C=)[.\\d]+"

If you want to be certain that there is only a single dot you might want to try something like this:

"(?<=C=)\\d+(?:\\.\\d+)?"

You should also be aware that this pattern can match the 1.2 in ABC=1.2.3;. You should consider if you need to improve the regular expression to correctly handle this situation.

Mark Byers
A: 

Your regular expression is only matching numeric characters. To also match the decimal point too you will need:

Pattern.compile("\\d+\\.\\d+")

The . is escaped because this would match any character when unescaped.

Note: this will then only match numbers with a decimal point which is what you have in your example.

Shadwell
+3  A: 

I may be mistaken on this part, but the reason it's separating the two is because group() will only match the last-matched subsequence, which is whatever gets matched by each call to find(). Thanks, Mark Byers.

For sure, though, you can solve this by placing the entire part you want inside a "capturing group", which is done by placing it in parentheses. This makes it so that you can group together matched parts of your regular expression into one substring. Your pattern would then look like:

Pattern.compile("C=(\\d+\\.\\d+)")

For the parsing 3567 or 3.567, your pattern would be C=(\\d+(\\.\\d+)?) with group 1 representing the whole number. Also, do note that since you specifically want to match a period, you want to escape your . (period) character so that it's not interpreted as the "any-character" token. For this input, though, it doesn't matter

Then, to get your 3.567, you would you would call m.group(1) to grab the first (counting from 1) specified group. This would mean that your Double.parseDouble call would essentially become Double.parseDouble("3.567")

As for taking C= out of your pattern, since I'm not that well-versed with RegExp, I might recommend that you split your input string on the semi-colons and then check to see if each of the splits contain the C; then you could apply the pattern (with the capturing groups) to get the 3.567 from your Matcher.

Edit For the more general (and likely more useful!) cases in gawi's comment, please use the following (from http://www.regular-expressions.info/floatingpoint.html)

Pattern.compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?")

This has support for optional sign, either optional integer or optional decimal parts, and optional positive/negative exponents. Insert capturing groups where desired to pick out parts individually. The exponent as a whole is in its own group to make it, as a whole, optional.

btlachance
NOTE: The regexp does not deal with the following floats: 10 10. .1 1.3e10 1.2e-12 1.41e+12
gawi
@gawi Thank you :) I've updated the answer with a regular expression that should do the trick. Is 10. considered a valid float, with the decimal point but no digits after?
btlachance
@btlachance 10. is a valid float literal in Java (well... 10.f to be exact)
gawi
@btlachance: I don't understand why you think using `group()` has something to do with the problem. He doesn't have any extra groups in his regular expression.
Mark Byers
@Mark Byers I guess this goes to show that I should have tried his code out first before responding and mis-reading the javadocs. I just ran the OP's code and it didn't produce the same result that was mentioned (getting the two separate matches). Thanks for the help :)
btlachance