views:

43

answers:

1

Hi --

I'm loading large text file of high school students into MySQL, but the school itself is only identified in the first line of each text file. Like so:

897781234Metropolitan High                   
340098 1001X  678 AS     Reading               101KAS DOE      KEITH   A1 340089 A 7782...

Using SQL code, how can I generate a column of the school number (e.g., 897781234) in the first column of the receiving table so that the school will be identified with each row?

To load the text files, I'm using:

LOAD DATA INFILE "f:/school_files/school897781234.txt" 
INTO TABLE my_table FIELDS TERMINATED BY ''
IGNORE 1 LINES;

Thanks!

A: 

Hmmm ... looks like you're doing this under Windows. I prefer Unix/Linux for large text manipulation, but you should be able to use similar techniques under Windows (try installing Cygwin). PowerShell also has some useful capabilities, if you're familiar with that. With that in mind, here are some ideas for you:

  1. Write a script that will munge your data files to make them MySQL friendly, by creating a new file that has the contents of all but the first line with the school information prepended on every line. Do your data load from the munged file.

    (munge-schools.sh)
    #!/bin/bash
    ifile=$1
    ofile=$2
    school=$(head -1 ${ifile})
    tail --lines=+2 ${ifile} | sed "s/^/${school}/" > ${ofile}
    
    
    ./munge-schools school897781234.txt school897781234.munged
    
  2. For each school, do the load as is (skipping the first line), but load it into a temporary table, then add a column for the school defaulting to the school information. Copy from the temp table into your final table.

Given a choice, I will always go with doing text manipulation outside of the database to make the input files friendlier -- there are lots of text manipulation tools that will be much faster at reformatting your data than your database's bulk load tools.

Craig Trader
Thanks for your reply. I think I've got the concept. At least, I now know that there is no "hidden" MySQL script that does this type of thing. Interestingly (or not), MySQL LOAD DATA INFILE allows IGNORE but not LIMIT (i.e., LIMIT 1 LINES). Thanks again!
dave