A couple of years ago I wrote a small utility to move data from an Oracle db to a Postgres db. I used Java and JDBC to accomplish this because I wanted Java to handle the data formatting for data used in a prepared statement to do the insert. The original version of the utility assumed that the table names and column names were the same in both databases. Later versions accepted a mapping file to handle name differences. This utility was a big hit in my organization but, unfortunately, it did not scale. It maxes out at about a million rows moved per hour. We now have tables with 30+ million rows and nobody is willing to wait 30 hours for their data to transfer.
The method below is the heart of the utility and the reason it does not scale. This method is executed once for each column of data, so it gets called (num_rows*num_cols times). With a profiler I see that this method consumes 58% of the execution time. The getObject() and findColumn() calls alone account for 53% of execution time!
public void setPlaceholderValue ( int placeHolderNum, ResultSet rs, String oracleColumnName, PreparedStatement stmt ) throws Exception {
int columnIndex = rs.findColumn(oracleColumnName) ;
int columnType = rs.getMetaData().getColumnType(columnIndex) ;
try{
if ( rs.getObject(oracleColumnName) != null ){
switch (columnType) {
case Types.VARCHAR: stmt.setString(placeHolderNum, rs.getString(columnIndex)); break;
case Types.INTEGER: stmt.setInt(placeHolderNum, rs.getInt(columnIndex)); break ;
case Types.DATE: stmt.setDate(placeHolderNum, rs.getDate(columnIndex)); break;
case Types.FLOAT: stmt.setFloat(placeHolderNum, rs.getFloat(columnIndex)); break ;
case Types.NUMERIC: stmt.setBigDecimal(placeHolderNum,rs.getBigDecimal(columnIndex)); break ;
case Types.TIMESTAMP: stmt.setTimestamp(placeHolderNum, rs.getTimestamp(columnIndex)); break ;
default: throw new SQLException("The result set column type " + rs.getMetaData().getColumnType(columnIndex) + " was not recognized. see the java.sql.Types class at http://java.sun.com/j2se/1.5.0/docs/api/ ");
}
} else {
stmt.setNull(placeHolderNum, columnType);
}
} catch (SQLException e){
System.out.println ("SQLException: " + e.getMessage() + " for record id=" + rs.getLong("id"));
throw new SQLException("rethrow");
}
}
I not sure I can refactor this method to bring down the transfer time sufficiently. I think the column by column approach simply does not scale.
Can anyone suggest a better way of doing this? Language is not an issue, I can do it with anything that can handle the job. Ideally, I would like to see a transfer rate of at least 10 million records per hour.