ansaurus

Question

Matching degree-based geographical coordinates with a regular expression

Answer 1

+1 A:

You forgot the x modifier on the qr operator.

daxim 2010-06-30 07:14:00

Thanks, unfortunately, that was only a typo here. It still doesn't work :(

Pedro Silva 2010-06-30 07:18:26

Answer 2

+1 A:

Try dropping the use utf8 statement.

The degree symbol corresponds to character value 0xB0 in my current encoding (whatever that is, but it ain't UTF8). 0xB0 is a "continuation byte" in UTF8; it is expected to by the second, third, or fourth character of a sequence that begins with something between 0xC2 and 0xF4. Using that string with utf8 will give you an error.

mobrule 2010-06-30 07:32:25

it doesn't, on my machine; in fact, it does not seem to make much of a difference. Would you know how I would go about including the damned `º` in the regex?

Pedro Silva 2010-06-30 07:40:02

Turns out you were right.

Pedro Silva 2010-06-30 17:49:39

Pedro: And that is why you are supposed to [decode your strings properly](http://p3rl.org/UNI) before you work on them with character oriented operations such as regex. By merely dropping the `utf8` pragma, you have swept the symptoms of the problem under the carpet - but it still exists to unexpectedly bite you in the future. I bet that in your program the test string is not a literal as in Kinopiko's answer and `Devel::Peek` would reveal that the simplified example is not functionally equivalent to your real code from which it is derived - please post a *complete* code example the next time.

daxim 2010-06-30 21:56:15

You're right, of course. My test strings were read in from a file.

Pedro Silva 2010-07-01 02:29:31

Answer 3

+1 A:

The ?: at the beginning of the regex makes it non-capturing, which is probably why the matches cannot be extracted or seen. Dropping it from the regex may be the solution.

If all of the coordinates are fixed-format, unpack may be a better way of obtaining the desired values.

my @twoCoordinates = unpack 'A2xA2xA2xAx3A2xA2xA2xA', "28°44'30"N., 33°12'36"E.";

print "@twoCoordinates";  # returns '28 44 30 N 33 12 36 E'

If not, then modify the regex:

my @twoCoordinates = "28°44'30"N., 33°12'36"E." =~ /\w+/g;

Zaid 2010-06-30 07:32:55

yeah, but I've been simplifying the regex, including removing the non-capturing parentheses, to no avail.thanks for the unpack idea though, it sounds like it should work, although I'm not sure that I won't see coordinates with 3 digit degrees, 1 digit minutes, etc.

Pedro Silva 2010-06-30 07:38:06

The thing is, my priority is to actually identify strings of that nature. `unpack` would sure come in handy if I knew a particular string were a coordinate, but but if I knew that I wouldn't need `unpack` because I'd identified it via a regex. :(

Pedro Silva 2010-06-30 07:42:56

Answer 4

+5 A:

This:

use strict;
use warnings;
use utf8;
my $re = qr{
    (?:
    \d{1,3} \s*  °   \s*
    \d{1,2} \s*  '   \s*
    \d{1,2} \s*  "   \s*
    [ENSW]  \s* \.?
            \s*  ,?  \s*
    ){2}
}x;
if (q{28°44'30"N., 33°12'36"E.} =~ $re) {
    print "match\n";
} else {
    print "no match\n";
}

works:

$ ./coord.pl 
match

Kinopiko 2010-06-30 09:24:34

confirming that this works

singingfish 2010-06-30 09:55:08

just to confirm that this works.

singingfish 2010-06-30 09:55:31

And strangely, it matches even without `use utf8`. Your regex is exactly like mine, no? Or what I missing something? Weird; anyway, thanks!

Pedro Silva 2010-06-30 17:36:46

It's not strange. If you don't use UTF-8 you get a bytewise match, but if you do use UTF-8 you get a character match. The problem you have is that you have not ensured about your input from the file.

Kinopiko 2010-06-30 22:36:16

ansaurus

tags:

views:

answers:

Matching degree-based geographical coordinates with a regular expression

related questions