tags:

views:

331

answers:

6

I have a CSV file which I am processing and putting the processed data into a text file. The entire data that goes into the text file is one big table(comma separated instead of space). My problem is How do I remember the column into which a piece of data goes in the text file?

For eg. Assume there is a column called 'col'. I just put some data under col. Now after a few iterations, I want to put some other piece of data under col again (In a different row). How do I know where exactly col comes? (And there are a lot of columns like this.)

Hope I am not too vague...

A: 

Probably either a dict of list or a list of dict. Personally, I'd go with the former. So, parse the heading row of the CSV to get a dict from column heading to column index. Then when you're reading through each row, work out what index you're at, grab the column heading, and then append to the end of the list for that column heading.

Dominic Rodger
+1  A: 

Python's CSV library has a function named DictReader that allow you to view and manipulate the data as a Python dictionary, which allows you to use normal iterative tools.

cpharmston
+1  A: 

Go with a list of lists. That is:

[[col1, col2, col3, col4], # Row 1
 [col1, col2, col3, col4], # Row 2
 [col1, col2, col3, col4], # Row 3
 [col1, col2, col3, col4]] # Row 4

To modify a specific column, you can transform this into a list of columns with a single statement:

>>> cols = zip(*rows)
>>> cols
[[row1, row2, row3, row4], # Col 1
 [row1, row2, row3, row4], # Col 2
 [row1, row2, row3, row4], # Col 3
 [row1, row2, row3, row4]] # Col 4
Cide
+1  A: 

Is SQLite an option for you? I know that you have CSV input and output. However, you can import all the data into the SQLite database. Then do all the necessary processing with the power of SQL. Then you can export the results as CSV.

Apreche
A: 

Good question, I have this problem very frequently.

In general, to handle csv files like that, I prefer to use R which is a data.frame object specifically designed for this.

In python, you can have a look at this library called datamatrix:

Or maybe at numpy/scipy's matrixes.

Named tuples are another alternative which has been tought for parsing csv files, but they are not pbased on the concept of a matrix:

dalloliogm
A: 

Your situation is kind of vague, but I'll try to answer your question, "How do I remember the column into which a piece of data goes in the text file?"

One way is to store a list of rows as dictionaries.

Note: I usually use tab-delimited text files, so forgive me if I'm forgetting something about csv formatting.

input_file = open('input.csv', 'r')

# ['col1', 'col2', 'col3']
headers = input_file.readline().strip().split(',')
stored_rows = []
for line in input_file:
    row_data = line.strip().split(',')
    stored_rows.append(dict(zip(headers, row_data)))

Now each row has a value for each column, which you can then process and output in whatever order you need.

output_headers = ['col3', 'col1', 'col2']
output_file = open('ouput.csv', 'w')
output_file.write(','.join(output_headers) + '\n')
for row in stored_rows:
    # do any processing you need here
    row['col1'] = row['col1'].strip().lower()  #for example

    # write the data to your output file in the order you want it
    output_file.write(','.join(map(row.get, output_headers)) + '\n')
tgray