views:

152

answers:

5

I have to go through the following text and match each of the following, and break them apart into separate records to save to a database. So this text:

ESTIMATED MINIMUM CENTRAL PRESSURE  951 MB
EYE DIAMETER  12 NM
MAX SUSTAINED WINDS 105 KT WITH GUSTS TO 130 KT
64 KT....... 25NE  25SE  25SW  25NW
50 KT....... 60NE  30SE  30SW  60NW
34 KT....... 75NE  50SE  50SW  75NW
12 FT SEAS.. 75NE  50SE  50SW  75NW
ALL QUADRANT RADII IN NAUTICAL MILES

REPEAT...CENTER LOCATED NEAR 25.5N  73.4W AT 23/0900Z
AT 23/0600Z CENTER WAS LOCATED NEAR 25.5N  72.6W

FORECAST VALID 23/1800Z 25.4N  75.6W
MAX WIND 110 KT...GUSTS 135 KT
50 KT... 60NE  60SE  60SW  60NW
34 KT... 75NE  75SE  75SW  75NW

FORECAST VALID 24/0600Z 25.5N  78.8W
MAX WIND 115 KT...GUSTS 140 KT
50 KT... 60NE  60SE  60SW  60NW
34 KT...100NE 100SE 100SW 100NW

FORECAST VALID 24/1800Z 25.8N  81.8W
MAX WIND  85 KT...GUSTS 105 KT
50 KT... 60NE  60SE  60SW  60NW
34 KT...100NE 100SE 100SW 100NW

... should end up looking something like the following:

forecastAdvisory = {
  :centralPressure => 951,
  :eyeDiameter => 12,
  :vMax => 105,
  :gMax => 130,
  :windRadii => {
    64 => [25, 25, 25, 25],
    50 => [60, 30, 30, 60],
    34 => [75, 50, 50, 75],
    12 => [75, 50, 50, 75]
  },
  :forecastTrack => {
    12 => {
      :latitude => 25.4,
      :longitude => 75.6,
      :vMax => 110,
      :gMax => 135
      :windRadii => {
        50 => [60, 60, 60, 60]
        34 => [75, 75, 75, 75]
      }
    },
    24 => {
      :latitude => 25.5,
      :longitude => 78.8,
      :vMax => 115,
      :gMax => 140
      :windRadii => {
        50 => [60, 60, 60, 60]
        34 => [100, 100, 100, 100]
      }
    },
    36 => {
      :latitude => 25.8,
      :longitude => 81.8,
      :vMax => 85,
      :gMax => 105
      :windRadii => {
        50 => [60, 60, 60, 60]
        34 => [100, 100, 100, 100]
      }
    }
  }
}

I know I could probably use the scan method for String in Ruby, but I'm not sure on how I could go through the file in order and get these values and parse them correctly.

UPDATE: Here are a few sample files I will be parsing using File.open, just for reference:

+1  A: 

Use this psuedocode

File.open("filename") do |l|
    one,two,three,four,five,six = l.split(" ")
    three = three[0,1]
    four = four[0,1]
    five = five[0,1]
    six = six[0,1]
    // code to create output format
end

so for example this line:

64 => [25, 25, 25, 25]

is formed by
one => [three,four,five,six]
ennuikiller
I have added some sample advisories to the main question. If it helps with your answer, take a look at those to see what other contents are in the files. Its (unfortunately) not only what I post, as what I posted to parse was just an excerpt. :/
Josh
A: 

If you want to keep it generic, you can use STDIN, e.g

forecastparser.rb < forecast.txt

and read each line via

input = $<.read.split
input.each do |line|
...
end
Omar Qureshi
I think its less about how to import the file, and more about how to parse those blocks of text by themselves, going through the file and "catching" each one somehow (in order as they appear)
Josh
A: 

It should be reasonably easy to build a right-recursive grammar for this, e.g.:

forecast : FORECAST VALID <int>/<int>Z <int><direction> <int><direction> \n <maxwind> <forecast_list>

direction : N
          | S
          | E
          | W
          | NE
          | NW (etc etc)

maxwind : MAX WIND <int> KT...GUSTS <int> KT

forecast_list : forecast_line \n forecast_list
              | 

forecast_line : <int> KT... <int><direction> <int><direction> <int><direction> <int><direction>

With a grammar like that, you can write (by hand) a recursive descent parser, which should be pretty simple. The benefit of this is that your production rules are context-free, so you should be able to deal with minor format shifts or new types of data files fairly easily.

A: 

Taking a quick glance on those files you linked it seems the "blocks" of information are the same -- same type of information -- between the files, even if the format is widely different?

So if I were to do this I would get a list of possible values for each block and then test/parse each block out of that. If it's a hurricane warning I know there aren't any important numbers, but tropical depression probably has something I'm interested in. (On a side note a tropical depression sounds really funny to me as a swede who hasn't heard of any weather being called depressing officially ^^)


@block_no = 0
[..]
File.open('forecast') do |f|
  block = []
  line = file.readline.strip
  block << line unless line.strip == ''

  Forecast.parse(block) # which has the current block_no and knows what kind of possible values there are to read out
  @block_no += 1
end

This feels as a very generic answer, but if I were to try to do this I would need to know of all possible formats the information could show up in before I could come up with a good solution. Possible just using a whole bunch of String#scan calls will be the best. :)

Good luck

ba
+1  A: 

I am posting an answer to this because I feel that the answers really didn't satisfy the requirements posted in the original question. Basically, there are multiple blocks of text with the same starting line, as so:

FORECAST VALID 23/1800Z 25.4N  75.6W
...
FORECAST VALID 24/0600Z 25.5N  78.8W
...
FORECAST VALID 24/1800Z 25.8N  81.8W

What I ended up doing is creating a regular expression for this line:

/^(FORECAST|OUTLOOK)\sVALID\s(\d+)\/(\d+)Z\s([\d\.]+)N\s+([\d\.]+)W/

Now, I needed to loop through every block of text until there were no more left. Since these forecasts are typically at the end of the advisory, I did it like so:

forecast_data = []

# Grab the rest of the forecast data
until data.eof?
  forecast_data << data.readline.strip
end

forecast_times = [12,24,36,48,72,96,120]
forecasts ||= {}
current_forecast = {}

until forecast_data.empty?
  line = forecast_data.shift

  if line =~ regular_expression
    # Start a new "current_forecast" array, which
    # contains the current block of text's data,
    # and parse it...
    forecasts.merge!(hour => current_forecast)
  end

  # Additional parsing for this block here...
end

# Merge the final block in with the rest
forecasts.merge!(hour => current_forecast) unless current_forecast.empty?

This seems to work. If anyone else has any idea on how to refactor this, or do it better using another method, please feel free to add another answer or comment and I'll change the answer! Thanks to everyone who posted; its truly appreciated.

Josh