views:

100

answers:

4

I have a CSV file which has the following format:

id,case1,case2,case3

123,null,X,Y

342,X,X,Y

456,null,null,null

789,null,null,X

above is the sample data that could be in that file. So for each line I need to know which of the case is not null. Is there an easy way to find the which case(s) are not null instead of splitting the string and go through each element in the list to find out which one is not null?

Result:

123,case2:case3

342,case1:case2:case3

456:None

789:case3

+1  A: 

You probably want to take a look at the CSV module, which has readers and writers that will enable you to create transforms.

>>> from StringIO import StringIO
>>> from csv import DictReader
>>> fh = StringIO("""
... id,case1,case2,case3
... 
... 123,null,X,Y
... 
... 342,X,X,Y
... 
... 456,null,null,null
... 
... 789,null,null,X
... """.strip())
>>> dr = DictReader(fh)
>>> dr.next()
{'case1': 'null', 'case3': 'Y', 'case2': 'X', 'id': '123'}

At which point you can do something like:

>>> from csv import DictWriter
>>> out_fh = StringIO()
>>> writer = DictWriter(fh, fieldnames=dr.fieldnames)
>>> for mapping in dr:
...     writer.write(dict((k, v) for k, v in mapping.items() if v != 'null'))
...

The last bit is just pseudocode -- not sure dr.fieldnames is actually a property. Replace out_fh with the filehandle that you'd like to output to.

cdleary
Also, the last snippet doesn't have exactly the output that you were looking for, but it will get you 90% of the way there. :-)
cdleary
A: 

Why do you treat spliting as a problem? For performance reasons?

Literally you could avoid splitting with smart regexps (like:

\d+,null,\w+,\w+
\d+,\w+,null,\w+
...

but I find it a worse solution than reparsing the data into lists.

Grzegorz Oledzki
regexps are a parsing problem, just like splitting. Except that they are far more expensive than splitting using a simple char search.
Christopher
You're right. Then I have no idea how to avoid splitting.
Grzegorz Oledzki
+1  A: 

Anyway you slice it, you are still going to have to go through the list. There are more and less elegant ways to do it. Depending on the python version you are using, you can use list comprehensions.

ids=line.split(",")
print "%s:%s" % (ids[0], ":".join(["case%d" % x for x in range(1, len(ids)) if ids[x] != "null"])
Christopher
A: 

You could use the Python csv module, comes in with the standard installation of python... It will not be much easier, though...

Reef