tags:

views:

163

answers:

2

My regular expression has 2 different outputs from the same code... but i don't know what's wrong. Here's a piece of code, i hope you can help me. Thanks!

String s = "48° 18′ 13,94″ nördliche Breite, "
         + "11° 34′ 31,98″ östliche Länge";

String kommazahl = "[0-9]{1,2}([\\.,][0-9]+)?";
String zahl = "[0-9]{1,2}";

Pattern p1 = Pattern.compile("("+ zahl +"[°/| ]{1,2}"+ zahl +"(['′/| ]{1,2}("+ kommazahl +")?)?).*"
                            +"("+ zahl +"[°/| ]{1,2}"+ zahl +"(['′/| ]{1,2}("+ kommazahl +")?)?).*");

Matcher m1 = p1.matcher(s);

System.out.println(m1.group(1) + "\n" + m1.group(5));

// Output should be:
// 48° 18′ 13,94
// 11° 34′ 31,98

// Output is:
// 48° 18′ 13,94
// 1° 34′ 31,98
+4  A: 

The problem is the .* at the end of the first line of the pattern. That's greedily matching "nördliche Breite, 1".

Perhaps you should change it to capture ".*, " so that it knows when to stop?

Pattern p1 = Pattern.compile
    ("("+ zahl +"[°/| ]{1,2}"+ zahl +"(['′/| ]{1,2}("+ kommazahl +")?)?).*, "
    +"("+ zahl +"[°/| ]{1,2}"+ zahl +"(['′/| ]{1,2}("+ kommazahl +")?)?).*");

Of course, that will only work if there's always a "comma space" between the two values you want in the rest of your data.

Jon Skeet
+5  A: 

The .* matches the first 1 of 11 greedily, while still allowing the rest of the pattern to match. Replace .* with something like [^0-9]*.

laalto
thanks, this works fine