ansaurus

Question

[Python] Best strategy for dealing with incomplete lines of data from a file.

Answer 1

A:

The way how you should deal with incomplete values depends on the context of your application (which you haven't mentioned yet).

For example, you can simply ignore missing values

>>> l = ['955.159', '62.8168', '', '', '', '', '', '', '', '', '', '', '', '', '', '29', '30', '0', '0']
>>> filter(bool, l) # remove empty values
['955.159', '62.8168', '29', '30', '0', '0']
>>> map(float, filter(bool, l)) # remove empty values and convert the rest to floats
[955.15899999999999, 62.816800000000001, 29.0, 30.0, 0.0, 0.0]

Or alternatively, you might want to replace them with NULL as you mentioned:

>>> map(lambda x: x or 'NULL', l)
['955.159', '62.8168', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', '29', '30', '0', '0']

As you can see, there are many different strategies of dealing with incomplete data. Anyway, the example snippets here might help you to choose the right one for your task. And as you can see, I prefer the functional programming like build-ins for doing stuff like this, because it's often the shortest and easiest way to do it (and I don't think there will be any noticeable differences in the execution time).

tux21b 2010-06-16 14:50:50

Thank you for the reply tux21b. You've introduced me to two built-ins; filter and map. I always prefer readability so the latter solution you've provided is better. I've tried it, it works.The context: The data represents analyzed ECG (ElectroCardioGram) parameters. The rows contain intervals (in msecs) and voltages (in mV) between two peaks. The data is scientific so I need to convert an empty string to NULL as in 'not recorded' rather than 0 msecs or 0mV. It is part of larger strategy to formalize data I have into a large dataset that I can then do further work to.

EmlynC 2010-06-17 13:02:11

You are welcome. The "NULL" value in Python is called `None` btw, so it might make sense to use `None` instead of `"NULL"` in your source.Another useful build-in for functional programming is `reduce()` and the itertools module has a couple of more such functions.

tux21b 2010-06-17 17:18:52

ansaurus

tags:

views:

answers:

[Python] Best strategy for dealing with incomplete lines of data from a file.

related questions