Consider to following method that read data from some data-structure (InteractionNetwork
) and writes them to a table in an SQLite database using an SQLite-JDBC dirver:
private void loadAnnotations(InteractionNetwork network) throws SQLException {
PreparedStatement insertAnnotationsQuery =
connection.prepareStatement(
"INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
PreparedStatement getProteinIdQuery =
connection.prepareStatement(
"SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
connection.setAutoCommit(false);
for(common.Protein protein : network.get_protein_vector()) {
/* Get ProteinId for the current protein from another table and
insert the value into the prepared statement. */
getProteinIdQuery.setString(1, protein.get_primary_id());
ResultSet result = getProteinIdQuery.executeQuery();
result.next();
insertAnnotationsQuery.setLong(2, result.getLong(1));
/* Extract all the other data and add all the tuples to the batch. */
}
insertAnnotationsQuery.executeBatch();
connection.commit();
connection.setAutoCommit(true);
}
This code works fine, the program runs in about 30 seconds and takes an average of 80m heap space. Because the code looks ugly, I want to refactor it. The first thing I did was moving the declaration of getProteinIdQuery
into the loop:
private void loadAnnotations(InteractionNetwork network) throws SQLException {
PreparedStatement insertAnnotationsQuery =
connection.prepareStatement(
"INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
connection.setAutoCommit(false);
for(common.Protein protein : network.get_protein_vector()) {
/* Get ProteinId for the current protein from another table and
insert the value into the prepared statement. */
PreparedStatement getProteinIdQuery = // <--- moved declaration of statement here
connection.prepareStatement(
"SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
getProteinIdQuery.setString(1, protein.get_primary_id());
ResultSet result = getProteinIdQuery.executeQuery();
result.next();
insertAnnotationsQuery.setLong(2, result.getLong(1));
/* Extract all the other data and add all the tuples to the batch. */
}
insertAnnotationsQuery.executeBatch();
connection.commit();
connection.setAutoCommit(true);
}
What happens when I run the code now is that it takes about 130m heap space and takes an eternity to run. Can anyone explain this strange behavior?