Hi,
I have a Python script which uses the MySQLdb interface to load various CSV files into MySQL tables.
In my code, I use Python's standard CSV library to read the CSV, then I insert each field into the table one at a time, using an INSERT
query. I do this rather than using LOAD DATA
so that I can convert null values and other minor clean-ups on a per-field basis.
Example table format:
`id_number` | `iteration` | `date` | `value`
102 | 1 | 2010-01-01 | 63
102 | 2 | 2010-01-02 | NULL
102 | 3 | 2010-01-03 | 65
The null value in the second iteration of id_number = 102
represents a case where value
hasn't changed from the previous day i.e. value
remains 63.
Basically, I need to convert these null values to their correct values. I can imagine 4 ways of doing this:
Once everything is inserted into the table, run a MySQL query that does the iterating and replacing all by itself.
Once everything is inserted into the table, run a MySQL query to send some data back to Python, process in Python then run a MySQL query to update the correct values.
Do the processing in Python on a per-field basis before each insert.
Insert into a temporary table and use SQL to insert into the main table.
I could probably work out how to do #2, and maybe #3, but have no idea how to do #1 or #4, which I think are the best methods as it then requires no fundamental changes to the Python code.
My question is A) which of the above methods is "best" and "cleanest"? (Speed not really an issue.) and B) how would I achieve #1 or #4?
Thanks in advance :)