views:

346

answers:

5

I'm trying to parse a CSV file using Python's csv module (specifically, the DictReader class). Is there a Pythonic way to detect empty or missing fields and throw an error?

Here's a sample file using the following headers: NAME, LABEL, VALUE

foo,bar,baz
yes,no
x,y,z

When parsing, I'd like the second line to throw an error since it's missing the VALUE field.

Here's a code snippet which shows how I'm approaching this (disregard the hard-coded strings...they're only present for brevity):

import csv

HEADERS = ["name", "label", "value" ]
fileH = open('configFile')
reader = csv.DictReader(fileH, HEADERS)

for row in reader:
    if row["name"] is None or row["name"] == "":
        # raise Error
    if row["label"] is None or row["label"] == "":
        # raise Error
    ...
fileH.close()

Is there a cleaner way of checking for fields in the CSV file w/out having a bunch of if statements? If I need to add more fields, I'll also need more conditionals, which I would like to avoid if possible.

+1  A: 

Something like this?

...
for row in reader:
    for column, value in row.items():
        if value is None or value == "":
            # raise Error, using value of column to say which field is missing

You may be able to use 'if not value:' as your test instead of the more explicit test you gave.

retracile
+9  A: 
if any(row[key] in (None, "") for key in row):
    # raise error

Edit: Even better:

if any(val in (None, "") for val in row.itervalues()):
    # raise error
balpha
Sweet Pythonic way!
Alix Axel
This method is pretty much incompatible with the behavior of csv.DictReader. It will loop through all keys in the row, even though some of them may be discarded by the DictReader because they weren't explicitly mentioned in HEADERS. More here: http://docs.python.org/library/csv.html#csv.DictReader
Triptych
@Triptych: There's at most one such additional key (the value passed as restkey to the constructor). I don't see the problem with that.
balpha
@balpha, it appears to me that if there were extra fields in a CSV row, that DictReader would ignore them, since nothing was passed to restkey, but your code would raise an error. In my opinion, that breaks a feature of DictReader.
Triptych
@Triptych: I just tried it out; it works just as expected. If nothing is passed as restkey, the default key (None) is used. Since the value to this key is always a sequence, val in (None, "") is False as it should be.
balpha
@balpha, your second edit is exactly what I'm looking for. I actually prefer it bailing if an extra field is added to the file. Thanks!
bedwyr
+2  A: 

Since None and empty strings both evaluate to False, you should consider this:

for row in reader:
    for header in HEADERS:
        if not row[header]:
            # raise error

Note that, unlike some other answers, you will still have the option of raising an informative, header-specific error.

Triptych
+1  A: 

This code will provide, for each row, a list of field names which are not present (or are empty) for that row. You could then provide a more detailed exception, such as "Missing fields: foo, baz".

def missing(row):
    return [h for h in HEADERS if not row.get(h)]

for row in reader:
    m = missing(row)
    if missing:
        # raise exception with list of missing field names
John Millikin
A: 

If you use matplotlib.mlab.csv2rec, it already saves the content of the file into an array and raise an error if one of the values is missing.

>>> from matplotlib.mlab import csv2rec
>>> content_array = csv2rec('file.txt')
IndexError: list index out of range

The problem is that there is not a simple way to customize this behaviour, or to supply a default value in case of missing rows. Moreover, the error message is not very explainatory (could be useful to post a bug report here).

p.s. since csv2rec saves the content of the file into a numpy record, it will be easier to get the values equal to None.

dalloliogm