views:

92

answers:

3

My program is fast enough, but I'd rather give up that speed for memory optimization since one user's maximum memory usage goes up to 300 MB meaning few of them could constantly crash the application. Most of the answers I found were related to speed optimization, and other were just general ("if you write directly from a database to memory there shouldn't be much memory usage"). Well, it seems there is :) I was thinking about not posting code so I wouldn't "lock" someone's ideas, but on the other hand, I could be wasting your time if you don't see what I've already done so here it is:

// First I get the data from the database in a way that I think can't be more 
// optimized since i've done some testing and it seems to me that the problem 
// isn't in the RS and setting FetchSize and/or direction does not help.

public static void generateAndWriteXML(String query, String oznaka, BufferedOutputStream bos, Connection conn)
        throws Exception
{
    ResultSet rs = null;
    Statement stmt = null;
    try
    {
        stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
        rs = stmt.executeQuery(query);
        writeToZip(rs, oznaka, bos);
    } finally
    {
        ConnectionManager.close(rs, stmt, conn);
    }
}

// then I open up my streams. In the next method I'll generate an XML from the
// ResultSet and I want that XML to be saved in an XML, but since its size takes up
// to 300MB, I want it to be saved in a ZIP. I'm thinking that maybe by writing 
// first to file, then to zip I could get a slower but more efficient program.

private static void writeToZip(ResultSet rs, String oznaka, BufferedOutputStream bos)
        throws SAXException, SQLException, IOException
{
    ZipEntry ze = new ZipEntry(oznaka + ".xml");
    ZipOutputStream zos = new ZipOutputStream(bos);
    zos.putNextEntry(ze);
    OutputStreamWriter writer = new OutputStreamWriter(zos, "UTF8");
    writeXMLToWriter(rs, writer);
    try
    {
        writer.close();
    } catch (IOException e)
    {
    }
    try
    {
        zos.closeEntry();
    } catch (IOException e)
    {
    }
    try
    {
        zos.flush();
    } catch (IOException e)
    {
    }
    try
    {
        bos.close();
    } catch (IOException e)
    {
    }
}

// And finally, the method that does the actual generating and writing. 
// This is the second point I think I could do the memory optimization since the
// DataWriter is custom and it extends a custom XMLWriter that extends the standard
// org.xml.sax.helpers.XMLFilterImpl I've tried with flushing at points in program,
// but the memory that is occupied remains the same, it only takes longer.

public static void writeXMLToWriter(ResultSet rs, Writer writer) throws SAXException, SQLException, IOException
{
    //Set up XML
    DataWriter w = new DataWriter(writer);
    w.startDocument();
    w.setIndentStep(2);
    w.startElement(startingXMLElement);
    // Get the metadata
    ResultSetMetaData meta = rs.getMetaData();
    int count = meta.getColumnCount();
    // Iterate over the set
    while (rs.next())
    {
        w.startElement(rowElement);
        for (int i = 0; i < count; i++)
        {
            Object ob = rs.getObject(i + 1);
            if (rs.wasNull())
            {
                ob = null;
            }
            // XML elements are repeated so they could benefit from caching
            String colName = meta.getColumnLabel(i + 1).intern();
            if (ob != null)
            {
                if (ob instanceof Timestamp)
                {
                    w.dataElement(colName, Util.formatDate((Timestamp) ob, dateFormat));
                }
                else if (ob instanceof BigDecimal)
                {
                    // Possible benefit from writing ints as strings and interning them
                    w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal) ob).intValue())));
                }
                else
                {   // there's enough of data that's repeated to validate the use of interning
                    w.dataElement(colName, ob.toString().intern());
                }

            }
            else
            {
                w.emptyElement(colName);
            }
        }
        w.endElement(rowElement);
    }
    w.endElement(startingXMLElement);
    w.endDocument();
}

EDIT: Here is an example of memory usage (taken with visualVM):

Memory usage screenshot

EDIT2: The database is Oracle 10.2.0.4. and I've set ResultSet.TYPE_FORWARD_ONLY and got a maximum of 50MB usage! As I said in the comments, I'll keep an eye on this, but it's really promising.

Memory usage after adding  ResultSet.TYPE_FORWARD_ONLY

EDIT3: It seems there's another possible optimization available. As I said, I'm generating an XML, meaning lots of data is repeated (if nothing else, then tags), meaning String.intern() could help me here, I'll post back when I test this.

A: 

Since it's Java, the memory should only spike temporarily, unless you are leaking references, like if you push things onto a list that is a member of a singleton that has life span of the entire program, or in my experience more likely is resource leaking, which happens when (and this I'm assuming applies to Java although I'm thinking of C#) objects that use unmanaged resources like file handles never call their cleanup code, a condition commonly caused by empty exception handlers that do not re-throw to the parent stack frame, which has the net effect of circumventing the finally block...

Gabriel
It does spike. I'll edit the OP with the screenshot of memory usage profiling.
Andrija
this is a guess, but maybe the java zipping class needs to have the whole source in memory to produce it's output which gets around the savings of the streamwriter?
Gabriel
Could be, that's why I asked here :)
Andrija
+3  A: 

Is it possible to use ResultSet.TYPE_FORWARD_ONLY?

You have used ResultSet.TYPE_SCROLL_INSENSITIVE. I believe for some databases (you didn't say which one you use) this causes the whole result set to be loaded in memory.

Thomas Mueller
We're using Oracle 10.2.0.4. and I've used this and I got results! I'm still suspicious about them (meaning I'll do more testing/profiling) but for now it really looks promising. The new memory usage is included in the OP.
Andrija
Yes, look at http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/resltset.htm#CIHCHBJB. In general, using anything other than ResultSet.TYPE_FORWARD_ONLY is a bad idea.
gpeche
A: 

I've ran some more tests and the conclusions are:

  1. The biggest gain is in JVM (or visualvm has problems monitoring Java 5 Heap space:). When I first reported that ResultSet.TYPE_FORWARD_ONLY got me a significant gain, I was wrong. The biggest gain was by using Java 5 under which the same program used up to 50MB of heapspace, as opposed to Java 6 under which the same code took up to 150 MB.
  2. Second gain is in ResultSet.TYPE_FORWARD_ONLY which made the program take as small amount of memory as possible.
  3. Third gain is in Sting.intern() which made the program take a bit less memory since it caches strings instead of creating new ones.

This is the usage with the optimizations 2 and 3 (if there wasn't String.intern() the graph would be the same, you should only add 5 MB more to every point)

alt text

and this is the usage without them (the lesser usage at the end is due to the program going out of memory :) ) alt text

Thank you everyone for your assistance.

Andrija