ansaurus

Question

[numpy] creating a masked array from text fields

Answer 1

+1 A:

The way you're doing it is fine. (though you could definitely make it a bit more readable by avoiding building the temporary "triple" dict, just to expand it a step later, i.m.o.)

The built-in way is to use numpy.genfromtxt. Depending on the amount of pre-processing you need to do to your text file, it may or may not do what you need. However, as a basic example: (Using StringIO to simulate a file...)

from StringIO import StringIO
import numpy as np

txt_data = """
1\t2\t3
4\t\t6
7t\8t\9"""

infile = StringIO(txt_data)
data = np.genfromtxt(infile, usemask=True, delimiter='\t')

Which yields:

masked_array(data =
 [[1.0 2.0 3.0]
 [4.0 -- 6.0]
 [7.0 8.0 9.0]],
             mask =
 [[False False False]
 [False  True False]
 [False False False]],
       fill_value = 1e+20)

One word of caution: If you do use tabs as your delimiter and an empty string as your missing value marker, you'll have issues with missing values at the start of a line. (genfromtxt essentially calls line.strip().split(delimiter)). You'd be better off using something like "xxx" as a marker for missing values, if you can.

Joe Kington 2010-09-11 21:04:06

Thanks - thought about genfromtxt but required too much pre-processing first, but thanks for the tip about missing values at the start of a line! And the StringIO trick...

Stephen 2010-09-14 11:15:24

ansaurus

tags:

views:

answers:

[numpy] creating a masked array from text fields

related questions