Hi!
Since I kicked off the process of inserting 7M rows from one table into two others, I'm wondering now if there's a faster way to do this. The process is expected to finish in an hour, that's 24h of processing.
Here's how it goes:
The data from this table
RAW (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER);
should find a new home in two other cluster tables T1 and T2
CREATE CLUSTER C1 (word VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;
CREATE CLUSTER C2 (doc VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;
T1 (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER) CLUSTER C1(word);
T2 (doc VARCHAR2(4000), word VARCHAR2(4000), count NUMBER) CLUSTER C2(doc);
through Java inserts with manual commit like this
stmtT1 = conn.prepareStatement("insert into T1 values(?,?,?)");
stmtT2 = conn.prepareStatement("insert into T2 values(?,?,?)");
rs = stmt.executeQuery("select word, doc, count from RAW");
conn.setAutoCommit(false);
while (rs.next()) {
word = rs.getString(1);
doc = rs.getString(2);
count = rs.getInt(3);
if (commitCount++==10000) { conn.commit(); commitCount=0; }
stmtT1.setString(1, word);
stmtT1.setString(2, doc);
stmtT1.setInt(3, count);
stmtT2.setString(1, doc);
stmtT2.setString(2, word);
stmtT2.setInt(3,count);
stmtT1.execute();
stmtT2.execute();
}
conn.commit();
Any ideas?