views:

1913

answers:

4

I have a friend who is finishing up his masters degree in aerospace engineering. For his final project, he is on a small team tasked with writing a program for tracking weather balloons, rockets and satellites. The program receives input from a GPS device, does calculations with the data, and uses the results of those calculations to control a series of motors designed to orientate a directional communication antenna, so the balloon, rocket or satellite always stays in focus.

Though somewhat of a (eternal) beginner myself, I have more programming experience than my friend. So when he asked me for advice, I convinced him to write the program in Python, my language of choice.

At this point in the project, we are working on the code that parses the input from the GPS device. Here is some example input, with the data we need to extract in bold:

$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F $GPRMC,093345.679,4234.7899,N,11344.2567,W,3,02,24.5,1000.23,M,,,0000*1F $GPRMC,044584.936,1276.5539,N,88734.1543,E,2,04,33.5,600.323,M,,,*00 $GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00 $GPRMC,066487.954,4572.0089,S,45572.3345,W,3,09,15.0,35000.00,M,,,*1F

Here is some further explanation of the data:

"I looks like I'll need five things out of every line. And bear in mind that any one of these area's may be empty. Meaning there will be just two commas right next to each other. Such as ',,,' There are two fields that may be full at any time. Some of them only have two or three options that they may be but I don't think I should be counting on that."

Two days ago my friend was able to acquire the full log from the GPS receiver used to track a recent weather balloon launch. The data is quite long, so I put it all in this pastebin.

I am still rather new with regular expressions myself, so I am looking for some assistance.

+6  A: 

It's simpler to use split than a regex.

>>> line="$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F "
>>> line.split(',')
['$GPRMC', '092204.999', '4250.5589', 'S', '14718.5084', 'E', '1', '12', '24.4', '89.6', 'M', '', '', '0000*1F ']
>>>
S.Lott
Hah, it would be like me to pick the more complicated solution!
crashsystems
+6  A: 

splitting should do the trick. Here's a good way to extract the data, as well:

>>> line="$GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00"
>>> line=line.split(",")
>>> neededData = (float(line[2]), line[3], float(line[4]), line[5], float(line[9]))
>>> print neededData
(3248.7779999999998, 'N', 11355.7832, 'W', 25722.5)
Claudiu
+3  A: 

You should also first check the checksum of the data. It is calculated by XORing the characters between the $ and the * (not including them) and comparing it to the hex value at the end.

Your pastebin looks like it has some corrupt lines in it. Here is a simple check, it assumes that the line starts with $ and has no CR/LF at the end. To build a more robust parser you need to search for the '$' and work through the string until hitting the '*'.

def check_nmea0183(s):
    """
    Check a string to see if it is a valid NMEA 0183 sentence
    """
    if s[0] != '$':
        return False
    if s[-3] != '*':
        return False

    checksum = 0
    for c in s[1:-3]:
        checksum ^= ord(c)

    if int(s[-2:],16) != checksum:
        return False

    return True
Brian C. Lane
Thanks for the example. I'll be sure to do something similar to this when writing the input checking function.
crashsystems
+2  A: 

Those are comma separated values, so using a csv library is the easiest solution.

I threw that sample data you have into /var/tmp/sampledata, then I did this:

>>> import csv
>>> for line in csv.reader(open('/var/tmp/sampledata')):
...   print line
['$GPRMC', '092204.999', '**4250.5589', 'S', '14718.5084', 'E**', '1', '12', '24.4', '**89.6**', 'M', '', '', '0000\\*1F']
['$GPRMC', '093345.679', '**4234.7899', 'N', '11344.2567', 'W**', '3', '02', '24.5', '**1000.23**', 'M', '', '', '0000\\*1F']
['$GPRMC', '044584.936', '**1276.5539', 'N', '88734.1543', 'E**', '2', '04', '33.5', '**600.323**', 'M', '', '', '\\*00']
['$GPRMC', '199304.973', '**3248.7780', 'N', '11355.7832', 'W**', '1', '06', '02.2', '**25722.5**', 'M', '', '', '\\*00']
['$GPRMC', '066487.954', '**4572.0089', 'S', '45572.3345', 'W**', '3', '09', '15.0', '**35000.00**', 'M', '', '', '\\*1F']

You can then process the data however you wish. It looks a little odd with the '**' at the start and end of some of the values, you might want to strip that stuff off, you can do:

>> eastwest = 'E**'
>> eastwest = eastwest.strip('*')
>> print eastwest
E

You will have to cast some values as floats. So for example, the 3rd value on the first line of sample data is:

>> data = '**4250.5589'
>> print float(data.strip('*'))
4250.5589
Jerub