tags:

views:

195

answers:

1

I have a 10-node DB2 9.5 database, with raw data on each machine (ie node1:/scratch/data/dataset.1 node2:/scratch/data/dataset.2 ... node10:/scratch/data/dataset.10

There is no shared NFS mount - none of my machines could handle all of the datasets combined.

each line of a dataset file is a long string of text, column delimited. The first column is the key. I don't know the hash function that DB2 will use, so dataset is not pre-partitioned.

Short of renaming all of my files, is there any way to get DB2 to load them all in parallel? I'm trying to do something like

load from '/scratch/data/dataset' of del modified by coldel| fastparse messages /dev/null replace into TESTDB.data_table part_file_location '/scratch/data';

but I have no idea how to suggest to db2 that it should look for dataset.1 on the first node, etc.

A: 

If the individual data files on each partition didn't originate from the same database partition, then you're stuck, and will have to run the load 10 times -- once from each different database partition. You could do this with db2_all to perform the load in a single command:

db2_all "db2 connect to db; db2 load from /scratch/data/dataset.\$DB2NODE of del ..."

Don't try to run the db2_all command in parallel. ;-)

One other thought for the future: Do you have enough room on a single server if you compress all of the files first? You can load from a named pipe:

   mkfifo f
   cat dataset.*.gz | gzip -dc > f &
   db2 "load from f of del ...."
Ian Bjorhovde