ansaurus

Question

Load non-uniform data from a txt file into a msql database

Answer 1

+2 A:

LOAD DATA INFILE has an IGNORE LINES option which you can use to skip the header. According to the docs, it also has a " LINES STARTING BY 'prefix_string'" option which you could use since all of your data lines seem to start with two blanks, while your timestamps start at the beginning of the line.

oggy 2009-07-28 18:15:11

This may work for loading the file in the first time, but how would you read only the last few new lines to update the database?

mouche 2009-07-28 18:40:29

Use IGNORE LINES?

oggy 2009-07-28 22:34:45

Answer 2

+2 A:

Another way to do this is to just have Python transform the files for you. You could have it filter the input file to an output file based on the criteria that you specify pretty easily. This code assumes you have some function is_data(line) that checks line for the criteria you specify and returns true if it is data.

with file("output", "w") as out:
  for line in file("input"):
    if is_data(line):
      out.write(line)

Additionally, if you files just continue to concat you could have it store and read the last recorded offset (this code may not be 100% right, I haven't test it. But you get the idea):

if os.path.exists("filter_settings.txt"):
   start=long(file("filter_settings.txt").read())
else:
   start=0

with file("output", "w") as out:
  input = file("input")
  input.seek(start)
  for line in input:
    if is_data(line):
      out.write(line)
  file("filter_settings.txt", "w").write(input.tell())

Christopher 2009-07-28 18:42:22

Thanks for the code example. Perhaps python i/o is a good way to go. I'm going to look into that last snippet. I do continue to append data to the end of my files.

mouche 2009-07-28 19:01:51

+1: Two part pipeline. Python to transform to a "clean" form. MySQL to load. Runs faster broken down this way. And you have a lot of control over the filtering without having to sweat the SQL stuff.

S.Lott 2009-07-28 20:08:25

ansaurus

tags:

views:

answers:

Load non-uniform data from a txt file into a msql database

related questions