tags:

views:

46

answers:

2

Hey,

So I'm dealing with a csv file that has missing values. What I want my script to is:

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x
print row

Here is an example of data, I trying it on, ideally it should work on any column lenghth

Before:
actnum,col2,col4
xxxxx ,    ,
xxxxx , 845   ,
xxxxx ,    ,545

After
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 0  ,545

Any guidance would be appreciated

Update Here is what I have now (thanks):

reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
print row

However it only seems to out put one record, I will be piping the output to a new file on the command line.

Update 3: Ok now I have the opposite problem, I'm outputting duplicates of each records. Why is that happening?

After
actnum,col2,col4
actnum,col2,col4
xxxxx , 0  , 0
xxxxx , 0  , 0
xxxxx , 845, 0
xxxxx , 845, 0
xxxxx , 0  ,545
xxxxx , 0  ,545

Ok I fixed it (below) thanks you guys for your help.

#!/usr/bin/python

import csv
import sys

#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.


reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
    print ','.join(str(x) for x in row)
+1  A: 

Change your code:

for row in reader:
    for x in row[:]:
                if len(x)< 1:
                         x = 0
                print x

into:

for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
                print x

Not sure what you think you're accomplishing by the print, but the key issue is that you need to modify row, and for that purpose you need an index into it, which enumerate gives you.

Note also that all other values, except the empty ones which you're changing into the number 0, will remain strings. If you want to turn them into ints you have to do that explicitly.

Alex Martelli
+1  A: 

You are very nearly there!

There are just a couple of small bugs.

  • len(x)< 1 will not work for the second column in the second row of your data because x will contain ' ' (and have a length > 1). You'll need to strip your strings.

  • print row will probably print an empty list because you've finished iterating. You can probably just remove this line.

Also: Are you trying to modify the file or just output the corrections to pipe to some other file or process?

Johnsyweb