tags:

views:

84

answers:

2

This is an elaboration of a previous question, but as I delve deeper into python, I just get more confused as to how python handles csv files.

I have a csv file, and it must stay that way (e.g., cannot convert it to text file). It is the equivalent of a 5 rows by 11 columns array or matrix, or vector...which ever.

I have been attempting to read in the csv using various methods I have found here and other places (e.g. python.org) so that it preserves the relationship between columns and rows, where the first row and the first column = non-numerical values. The rest are float values, and contain a mixture of positive and negative floats.

What I wish to do is import the csv and compile it in python so that if I were to reference a column header, it would return its associated values stored in the rows. Example:

>>> workers, constant, age
>>> workers
    w0
    w1
    w2
    w3
    constant
    7.334
    5.235
    3.225
    0
    age
    -1.406
    -4.936
    -1.478
    0

and so forth...

What I am looking for are some "best practices"/"pythonic" techniques for handling this kind of data structure. I am very new to python.

thanks in advance!

+4  A: 
import csv
with open( <path-to-file>, "rb" ) as theFile:
    reader = csv.DictReader( theFile ):
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

katrielalex
Always open csv files in binary mode.
John Machin
True that. Thanks.
katrielalex
Thanks katriealex, this was one method I had tried previously so apparently I was on the right track. Appreciate the help!
myClone
+5  A: 

Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = reader.next()
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

for each row instead of the similar two lines above.

John Machin
Thanks a lot John, this is very helpful. I had tried some techniques using some of the functions you used in the above example, but was unable to "package" the multiple csv functions appropriately. This will help tremendously. How then would I go about "stacking" these columns to generate a table of sorts? Could I use numpy.hstack (or is it vstack?)
myClone
I don't understand "stacking". You already have "a table of sorts" whose contents you can access by `column['column_name'][row_index]`. I don't use `numpy`; I'd need to read the manual (hint, hint). Perhaps you could ask another question, specifying what you need to do with the table.
John Machin
Thanks John, I'll research it.
myClone