views:

349

answers:

4

Hi, could someone help me with a Python problem. I cant seem to find any code examples for the following scenario:

Read in user specified csv file. Then for each row in file, calculate the mean & stddev for all numeric values in that row & print to screen.

I know csv rows are read in a string so would need to be converted to a float prior to running the calculation but after that I'm lost. any ideas????

A: 

There is a csv module in the standard library which is probably a good place to start. After that I'd recommended looking at putting the numbers into a numpy array, which will make calculating mean/std along rows/columns very easy (array.max(axis=0) array.std(axis=1) etc.)

[edit] In fact numpy.loadtxt should do what you want:

d = np.loadtxt(filename, delimeter=',')
d.mean(axis=0) # or axis=1
d.std(axis=0) # or axis=1
thrope
A: 
import sys
import csv

for line in csv.reader(sys.argv[1]):
    print sum(line) / len(line)

Untested, but I think it's quite pretty.

Thomas
Heh, hit the post button in a hurry, forgot to codify it...
Thomas
and you are almost finishing his homework
ghostdog74
A: 

Here's a good page explaining how to read data files with scipy/numpy. The files are read in as numpy arrays (rather than strings) and you can then calculate means, stdevs, etc. again using numpy/scipy.

wgrover
Wow! Homework it is... Better to write the stddev formula by hand a few times, before getting in the magic of mumpy!
mjv
@mjv - Point well taken; I missed the "homework" tag. I guess that a questioner who's honest enough to use the "homework" tag deserves an instructive answer, not a quick and easy one. I'd +1 your excellent answer but I'm obviously too new at this to have the necessary reputation points. :)
wgrover
@wgrover Welcome to SO. Take heart, you'll soon get reps. BTW, a small hint, reps tend to come easier and in bigger quantities for relatively simple and broad questions; This is a bit of a paradox (or rather an unfair characteristic of SO), basically owing to the fact that fewer people may have interest or the technical savvy to evaluate responsed to a complicated or esoteric topic. It's not _only_ about the reps, do contribute in tough/esoteric stuff stuff if you can as well. (students' questions are nice too).
mjv
+1  A: 

It being homework, I'm providing only a few hints

  • beware of using numpy as suggested in other responses (unless of course your class is about advanced numeric calculations and such, at which case numpy would make sense...)
  • using the csv module may simplify the parsing of the input file, here again, learning how to do this by hand may be useful.

The two remarks above bring the question of how should Python be used in the context of an intro do programming course? [as seems to be the case here]
Python comes with batteries included, meaning that it provides access to very numerous modules (both "standard" and "add-on") which help tackle the most common (and indeed also some of the most esoteric) needs. The language itself provides many constructs that make various things so easy. The conundrum for the beginner is then to decide whether to learn directly the most "pythonic" way of doing things (by leveraging all these powerful constructs and libraries) or to write things "long hand". It is a balancing act... for sure you should remember that for most tasks there is probably a module [or two] which can greatly help. Often instructors will provide "limits" as to what libraries/modules are allowed.

Going back to the problem at hand...

  • the structure of the program would look something like
[pseudo code]
  open file
  for each line in the file
     for each row in the file
         parse the numeric values to an array
         for each number in the array
               sum it up
         calculate the mean
         for each number in the array  #  (again, i.e. now that you have the mean, needed for stddev formula)
              sum up the stddev factors
         calculate the stddev
     print results

Now, your turn... ;-)

mjv