views:

158

answers:

5

Hi,

I am writing a full database extract program in java. Database is Oracle, and it is huge. Some tables have ~260 million records. The program should create one file per table in a specific format, so using Oracle datapump etc is not an option. Also, some company security policies do not allow to write a PL/SQL procedure to create files on DB server for this requirement. I have to go with Java and JDBC.

The issue I am facing is that Since files for some of the table is huge (~30GB) I am running out of memory almost every time even with a 20GB Java Heap. During the creation of file when the file size exceeds the heap size, even with one of the most aggressive GC policy, the process seems to hang-up. For example if the file size is > 20GB and heap size is 20GB, once heap utilization hits max heap size, its slows down writing 2MB per minute or so and at this speed, it will take months to get full extract.

I am looking for some way to overcome this issue. Any help would be greatly appreciated.

Here are some details of the system configuration I have: Java - JDK1.6.0_14

System config - RH Enterprise Linux (2.6.18) running on 4 X Intel Xeon E7450 (6 cores) @2.39GH

RAM - 32GB

Database Oracle 11g

file wirting part of the code goes below:

private void runQuery(Connection conn, String query, String filePath,
        String fileName) throws SQLException, Exception {
    PreparedStatement stmt = null;
    ResultSet rs = null;
    try {
        stmt = conn.prepareStatement(query,
                ResultSet.TYPE_SCROLL_INSENSITIVE,
                ResultSet.CONCUR_READ_ONLY);
        stmt.setFetchSize(maxRecBeforWrite);
        rs = stmt.executeQuery();
        // Write query result to file
        writeDataToFile(rs, filePath + "/" + fileName, getRecordCount(
                query, conn));
    } catch (SQLException sqle) {
        sqle.printStackTrace();
    } finally {
        try {
            rs.close();
            stmt.close();
        } catch (SQLException ex) {
            throw ex;
        }
    }
}

private void writeDataToFile(ResultSet rs, String tempFile, String cnt)
        throws SQLException, Exception {
    FileOutputStream fileOut = null;
    int maxLength = 0;
    try {
        fileOut = new FileOutputStream(tempFile, true);
        FileChannel fcOut = fileOut.getChannel();

        List<TableMetaData> metaList = getMetaData(rs);
        maxLength = getMaxRecordLength(metaList);
        // Write Header
        writeHeaderRec(fileOut, maxLength);
        while (rs.next()) {
            // Now iterate on metaList and fetch all the column values.
            writeData(rs, metaList, fcOut);
        }
        // Write trailer
        writeTrailerRec(fileOut, cnt, maxLength);
    } catch (FileNotFoundException fnfe) {
        fnfe.printStackTrace();
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } finally {
        try {
            fileOut.close();
        } catch (IOException ioe) {
            fileOut = null;
            throw new Exception(ioe.getMessage());
        }
    }
}

private void writeData(ResultSet rs, List<TableMetaData> metaList,
        FileChannel fcOut) throws SQLException, IOException {
    StringBuilder rec = new StringBuilder();
    String lf = "\n";
    for (TableMetaData tabMeta : metaList) {
        rec.append(getFormattedString(rs, tabMeta));
    }
    rec.append(lf);
    ByteBuffer byteBuf = ByteBuffer.wrap(rec.toString()
            .getBytes("US-ASCII"));
    fcOut.write(byteBuf);
}

private String getFormattedString(ResultSet rs, TableMetaData tabMeta)
        throws SQLException, IOException {
    String colValue = null;
    // check if it is a CLOB column
    if (tabMeta.isCLOB()) {
        // Column is a CLOB, so fetch it and retrieve first clobLimit chars.
        colValue = String.format("%-" + tabMeta.getColumnSize() + "s",
                getCLOBString(rs, tabMeta));
    } else {
        colValue = String.format("%-" + tabMeta.getColumnSize() + "s", rs
                .getString(tabMeta.getColumnName()));
    }
    return colValue;

}

+1  A: 

Edit: Map your database tables to Class usig JPA.
Now load collection of Objects from DB using Hibernate in the Batch of some tolerable size and serialize it to FILE .

org.life.java
Can you be a bit clearer on this?
Amit
A: 

Is your algorithm like the following? This is assuming a direct mapping between DB rows and lines in the file:

// open file for writing with buffered writer.
// execute JDBC statement
// iterate through result set
    // convert rs to file format
    // write to file
// close file
// close statement/rs/connection etc

Try using Spring JDBC Template to simplify the JDBC portion.

Synesso
How would Spring JDBC help in decreasing the memory usage which is actually being caused by a huge file write process?
Amit
Using `SimpleJdbcTemplate` would fix the problem, because it would give you a correctly instantiated `ResultSet` to iterate over. My personal opinion is that you should *always* use Spring for no other reason than the `DataAccessException` hierarchy
Jon Freedman
@Jon Freedman - I agree that Spring helps but I preferred not to use it here since it was preety small functionality out side the actual application.
Amit
A: 

I believe this must be possible on default 32 MB java heap. Just fetch each row, save the data to file stream, flash and close once done.

Ashish Patil
Are you sure the file size would not effect the heap size usage. I am doing exactly what you are saying. I am writing record by record.
Amit
+3  A: 

Its probably due to the way you call prepareStatement, see this question for a similar problem. You don't need scrollability and a ResultSet will be read-only be default so just call

stmt = conn.prepareStatement(query);
Jon Freedman
I am testing what you suggested. Will get back once done.
Amit
A: 

What value are you using for maxRecBeforWrite?

Perhaps the query of the max record length is defeating your setFetchSize by forcing JDBC to scan the entire result for record length? Maybe you could delay writing your header and note the max record size on the fly.

mwhidden
maxRecBeforWrite=100
Amit