views:

109

answers:

2

UPDATE: added an example to clarify the format of the data.

Considering a CSV with each line formatted like this:

tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,[tbl2.col1:tbl2.col2]+

where [tbl2.col1:tbl2.col2]+ means that there could be any number of these pairs repeated

ex:

tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2

The tables would relate to eachother using the line number as a key which would have to be created in addition to any columns mentioned above.

  1. Is there a way to use mysql load data infile to load the data into two separate tables?
  2. If not, what Unix command line tools would be best suited for this?
+1  A: 

no, not directly. load data can only insert into one table or partitioned table.

what you can do is load the data into a staging table, then use insert into to select the individual columns into the 2 final tables. you may also need substring_index if you're using different delimiters for tbl2's values. the line number is handled by an auto incrementing column in the staging table (the easiest way is to make the auto column last in the staging table definition).

the format is not exactly clear, and is best done w/perl/php/python, but if you really want to use shell tools:

cut -d , -f 1-5 file | awk -F, '{print NR "," $0}' > table1

cut -d , -f 6- file | sed 's,\:,\,,g' | \
  awk -F, '{i=1; while (i<=NF) {print NR "," $(i) "," $(i+1); i+=2;}}' > table2

this creates table1 and table 2 files with these contents:

1,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
2,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
3,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5

and

1,tbl2.col1,tbl2.col2
1,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
jspcal
I assume by "Insert into" you mean running insert queries. This will take significantly longer than a mysql load data infile type operation. Re: your approach, how do you define the table so that the bracketed values are inserted correctly? Since we have an unknown number of them that is.
hinghoo
you *cannot* use mysql load data infile for this, so it's a pointless comparison. however, select into is very fast anyway. the brackets are inserted as one field into the staging table (since they don't contain commas). then you pull them out with substring_index.
jspcal
jspcal: please see the example I added above.
hinghoo
added some awk for your latest example
jspcal
+1  A: 

As you say, the problematic part is the unknown number of [tbl2.col1:tbl2.col2] pairs declared in each line. I would tempted to solve this through sed: split the one file into two files, one for each table. Then you can use load data infile to load each file into its corresponding table.

APC