tags:

views:

3097

answers:

3

I'm sure this is really simple if you know anything about binary files, but I'm a newbie on that score.

How would I extract the data from NASA .hgt files? Here is a description from www2.jpl.nasa.gov/srtm/faq.html:

The SRTM data files have names like "N34W119.hgt". What do the letters and numbers refer to, and what is ".hgt" format?

Each data file covers a one-degree-of-latitude by one-degree-of-longitude block of Earth's surface. The first seven characters indicate the southwest corner of the block, with N, S, E, and W referring to north, south, east, and west. Thus, the "N34W119.hgt" file covers latitudes 34 to 35 North and longitudes 118-119 West (this file includes downtown Los Angeles, California). The filename extension ".hgt" simply stands for the word "height", meaning elevation. It is NOT a format type. These files are in "raw" format (no headers and not compressed), 16-bit signed integers, elevation measured in meters above sea level, in a "geographic" (latitude and longitude array) projection, with data voids indicated by -32768. International 3-arc-second files have 1201 columns and 1201 rows of data, with a total filesize of 2,884,802 bytes ( = 1201 x 1201 x 2). United States 1-arc-second files have 3601 columns and 3601 rows of data, with a total filesize of 25,934,402 bytes ( = 3601 x 3601 x 2). For more information read the text file "SRTM_Topo.txt" at http://edcftp.cr.usgs.gov/pub/data/srtm/Readme.html

Thanks for any help! I am going to use this data in a python script, so if you could not use any language-specific tricks for any other languages, that would be awesome.

+2  A: 

Since the records are fixed length (16-bit signed integers) and you know the grid size (1201 x 1201 or 3601x3601), Python's struct module seems ideally suited (untested code):

from struct import unpack,calcsize

# 'row_length' being 1201 or 3601 and 'row' being the raw data for one row
def read_row( row, row_length ):
    format = 'h'  # h stands for signed short

    for i in range(0, row_length):
        offset = i * calcsize(format)
        (height,) = unpack(format, row[offset : offset+calcsize(format))
        # do something with the height

Describing it in more generic terms, basically you want to read the file in 2 bytes at a time, parse the bytes read as a 16-bit signed integer and process it. Since you know the grid size already you can read it in row by row or in any other manner that is convenient to your application. It also means you can randomly seek to specific coordinates inside the data file.

codelogic
You actually need to use '!h' format. big-endian as describe in SRTM specs
skrat
A: 

The NASA SRTM data files are in Big-Endian format, so depending on what platform you are reading the data on, you may have to do a conversion from Big-Endian to Little-Endian.

There are numerous sources on how to do this, I have no experience with Python so I can't help you there.

But if you forget this, your values are going to be all messed up.

A: 

If you have photoshop you might be able to play around with the raw import to get it to read these files and save them out to something more useful. I have had some success doing this sort of thing in the past.

Dolphin