tags:

views:

193

answers:

5

I am reading a file by line and need to extract latitude and longitude from it. This how lines can looks:

DE  83543   Rott am Inn Bayern  BY  Oberbayern      Landkreis Rosenheim 47.983  12.1278 
DE  21147   Hamburg Hamburg HH          Kreisfreie Stadt Hamburg    53.55   10  

What's for sure is, there are no dots surrounded by digits except for the ones representing the doubles. Unfortunately there are Values without a dot, so it's probably best to check for numbers from the end of the String.

thanks for your help!

+3  A: 

Is it a tabulator separated csv table? Then I'd suggest looking at String#split and simply choosing the two last fields from the resulting String array.

... anyway, even if not csv, split on whitechars and take the two last fields of the String array - those are the lat/lon values and you can convert them with Double#parseDouble.

Andreas_D
The Segments are tab seperated, but sometimes there is more than one tab that seperates two segments.
tzippy
That wouldn't matter.
Andreas_D
+2  A: 

If you can use the java.lang.String#split()

//Split by tab
String values[] = myTextLineByLine.split("\t");
List<String> list = Arrays.asList(values);
//Reverse the list so that longitude and latitude are the first two elements
Collections.reverse(list);

String longitude = list.get(0);
String latitude = list.get(1);
Shervin
That's my favourite as it 'implements' K.I.S.S.
Andreas_D
Thanks! This totally did the job right and as you mentioned it "K.I.S.S"es !
tzippy
A: 
    Pattern p = Pattern.compile(".*?(\\d+\\.?\\d*)\\s+(\\d+\\.?\\d*)");
    Matcher m = p.matcher(s1);
    if (m.matches()) {
        System.out.println("Long: " + Double.parseDouble(m.group(1)));
        System.out.println("Latt: " + Double.parseDouble(m.group(2)));
    }
  1. .*? eat characters reluctantly
  2. (\\d+\\.?\\d*) some digits, an optional decimal point, some more digits
  3. \\s+ at least one white-space character (such as a tab character)
  4. (\\d+\\.?\\d*) some digits, an optional decimal point, some more digits
aioobe
A: 

This solution uses Scanner.findWithinHorizon and capturing groups:

    import java.util.*;
    import java.util.regex.*;
    //...

    String text = 
        "DE  83543 Blah blah blah 47.983  12.1278\n" +
        "DE\t21147 100% hamburger beef for 4.99 53.55 10\n";

    Scanner sc = new Scanner(text);
    Pattern p = Pattern.compile(
        "(\\w+) (\\d+) (.*) (decimal) (decimal)"
            .replace("decimal", "\\d+(?:\\.\\d+)?")
            .replace(" ", "\\s+")
    );
    while (sc.findWithinHorizon(p, 0) != null) {
        MatchResult mr = sc.match();
        System.out.printf("[%s|%s] %-30s [%.4f:%.4f]%n",
            mr.group(1),
            mr.group(2),
            mr.group(3),
            Double.parseDouble(mr.group(4)),
            Double.parseDouble(mr.group(5))
        );
    }

This prints:

[DE|83543] Blah blah blah                 [47.9830:12.1278]
[DE|21147] 100% hamburger beef for 4.99   [53.5500:10.0000]

Note the meta-regex approach of using replace to generate the "final" regex. This is done for readability of the "big picture" pattern.

polygenelubricants
A: 

I have tried this:

    public static void main(String[] args)
    {
        String str  ="DE 83543   Rott am Inn Bayern  BY  Oberbayern  Landkreis Rosenheim 47.983  12.1278";
        String str1  ="DE  21147   Hamburg Hamburg HH          Kreisfreie Stadt Hamburg    53.55   10  ";

        String[] tempStr1 = str1.split("[ \t]+");

        System.out.println(tempStr1.length);
        double latitude = Double.parseDouble(tempStr1[tempStr1.length - 2]);
        double longitude = Double.parseDouble(tempStr1[tempStr1.length - 1]);

        System.out.println(latitude + ", " + longitude);
    }

It splits the string whenever it encounters white spaces. Since the coordinates will always be the last two elements, it should be able to print them without any problem. Below is the output.

53.55, 10.0

47.983, 12.1278

npinti