I have text files with a lot of uniform rows that I'd like to load into a mysql database, but the files are not completely uniform. There are several rows at the beginning for some miscellaneous information, and there are timestamps about every 6 lines.
"LOAD DATA INFILE" doesn't seem like the answer here because of my file format. It doesn't seem flexible enough.
Note: The header of the file takes up a pre-determined number of lines. The timestamp is predicatable, but there are some other random notes that can pop up that need to be ignored. They always start with several keywords that I can check for though.
A sample of my file in the middle:
103.3 .00035
103.4 .00035
103.5 .00035
103.6 .00035
103.7 .00035
103.8 .00035
103.9 .00035
Time: 07-15-2009 13:37
104.0 .00035
104.1 .00035
104.2 .00035
104.3 .00035
104.4 .00035
104.5 .00035
104.6 .00035
104.7 .00035
104.8 .00035
104.9 .00035
Time: 07-15-2009 13:38
105.0 .00035
105.1 .00035
105.2 .00035
From this I need to load information into three fields. The first field needs to be the filename, and the other are present in the example. I could add the filename to be in front of each data line, but this may not be necessary if I use a script to load the data.
If required, I can change the file format, but I don't want to lose the timestamps and header information.
SQLAlchemy seems like a possible good choice for python, which I'm fairly familiar with.
I have thousands of lines of data, so loading all my files that I already have may be slow at first, but afterwards, I just want to load in the new lines of the file. So, I'll need to be selective about what I load in because I don't want duplicate information.
Any suggestions on a selective data loading method from a text file to a mysql database? And beyond that, what do you suggest for only loading in lines of the file that are not already in the database?
Thanks all. Meanwhile, I'll look into SQLAlchemy a bit more and see if I get somewhere with that.