views:

355

answers:

2

I have geographical data which has 14 variables. The data is in the following format:

QUADNAME: rockport_colony_SD RESOLUTION: 10 ULLAT: 43.625
ULLON: -97.87527466 LRLAT: 43.5
LRLON: -97.75027466 HDATUM: 27
ZMIN: 361.58401489 ZMAX: 413.38400269 ZMEAN: 396.1293335 ZSIGMA: 12.36359215 PMETHOD: 5
QUADDATE: 20001001

The whole data has many previous variables in the sequence.

How can I extract the coordinates ULLAT, ULLON and LRLAT from the data into three lists, so that the each row corresponds to one location?

This question was raised by the problem in the post.

+2  A: 

Given a StreamReader named reader, this should give you a list of (float, float, float). I suggest a list of 3-tuples because it'll probably be more convenient and more efficient to walk through, unless for some reason you only want to get all the points individually.

coords = []
reader
while line=reader.readline():

  index_ullat = line.find("ULLAT")
  if index_ullat >= 0:
    ullat = float(line[ index_ULLAT+7 : ])

    line = reader.readline()

    index_ullon = line.find("ULLON")
    index_lrlat = line.find("LRLAT")
    if index_ullon >= 0 and index_lrlat >= 0:
      ullon = float(line[ index_ullon+7 : index_lrlat-1 ])
      lrlat = float(line[ index_lrlat+7 : ])
    else:
      raise InputError, "ULLON and LRLAT didn't follow ULLAT."

    coords.append(ullat, ullon, lrlat)

It may work, but it's ugly. I'm no expert at string parsing.

Nikhil Chelliah
Edit: just pointed the link to the new, prettier documentation. :-)
cdleary
+4  A: 

Something like this might work if the data is all in a big flat text file:

import re

data = """
QUADNAME: rockport_colony_SD RESOLUTION: 10 ULLAT: 43.625
ULLON: -97.87527466 LRLAT: 43.5
LRLON: -97.75027466 HDATUM: 27
ZMIN: 361.58401489 ZMAX: 413.38400269 ZMEAN: 396.1293335 ZSIGMA: 12.36359215 PMETHOD: 5
QUADDATE: 20001001
"""

regex = re.compile(
    r"""ULLAT:\ (?P<ullat>-?[\d.]+).*?
    ULLON:\ (?P<ullon>-?[\d.]+).*?
    LRLAT:\ (?P<lrlat>-?[\d.]+)""", re.DOTALL|re.VERBOSE)

print regex.findall(data) # Yields: [('43.625', '-97.87527466', '43.5')]
cdleary
This. Easier to read and conveys intention at a glance.
Nick Presta
@Nick: Sorry, but I don't understand your comment -- are you saying there's something I could make more readable? This was just kind of a proof-of-concept demonstration of how you could use a regex to parse out the data.
cdleary
Thank you!I have your code now in the .py -file. How can I use it to process a .txt -file? I guess that we need a parameter in the .py file, so that we can use a syntax like $ py-file file-to-be-processed
Masi
@Masi: That sounds like it should be the contents of another question!
cdleary
@cdleary: a new post is here: http://stackoverflow.com/questions/491085/how-can-i-use-a-python-file-to-process-a-txt-file
Masi