tags:

views:

142

answers:

4

First, I had a problem with getting the data from the Database, it took too much memory and failed. I've set -Xmx1500M and I'm using scrolling ResultSet so that was taken care of. Now I need to make an XML from the data, but I can't put it in one file. At the moment, I'm doing it like this:

while(rs.next()){
                i++;
                xmlStringBuilder.append("\n\t<row>");
                xmlStringBuilder.append("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>");
                xmlStringBuilder.append("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>");
                xmlStringBuilder.append("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>");
//etc.
                xmlStringBuilder.append("\n\t</row>");
                if (i%100000 == 0){
                                    //stores the data to a file with the name i.xml
                    storeKBR(xmlStringBuilder.toString(),i);
                    xmlStringBuilder= null;
                    xmlStringBuilder= new StringBuilder();  
                }

and it works; I get 12 100 MB files. Now, what I'd like to do is to do is have all that data in one file (which I then compress) but if just remove the if part, I go out of memory. I thought about trying to write to a file, closing it, then opening, but that wouldn't get me much since I'd have to load the file to memory when I open it.

+3  A: 

Why not write all data to one file and open the file with the "append" option? There is no need to read in all the data in the file if you are just going to write to it.

However, this might be a better solution:

PrintWriter writer = new PrintWriter(new BufferedOutputStream(new FileOutputStream("data.xml")));

while(rs.next()){
    i++;
    writer.print("\n\t<row>");
    writer.print("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>");
    writer.print("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>");
    writer.print("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>");
    //...

    writer.print("\n\t</row>");
}

writer.close();

The BufferedOutputStream will buffer the data before printing it, and you can specify the buffer size in the constructor if the default value does not suit your needs. See the java API for details: http://java.sun.com/javase/6/docs/api/.

Daniel Abrahamsson
This sounds good but I'm not certain how to do it. This is my current code fos = new FileOutputStream(new File(zipFolder + i + ".xml")); fos.write(xmlString.getBytes()); fos.flush(); fos.close();
Andrija
It still takes 1.5 GB of RAM but that much I can handle :) Thank you
Andrija
I'm glad you got it working, but in general, there's no reason why this kind of task couldn't be completed in 64M of memory: streaming results form the DB is the first step (http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html), and writing them directly to a file is the second part.
Tomislav Nakic-Alfirevic
Thing is, I inherited the app on Saturday and it had to be working by Monday, so I don't have much manoeuvring space:) As soon as this is done, I'll get to refactoring this and get back to you. Thank you too for the comments.
Andrija
Andrija, Daniel's solution should not take near that much memory - probably <64MB but depends on your other code. Your other comment indicates that you are still trying to assemble some large String(Builder) and also doing the writing wrong too (using `.getBytes()` is not a good idea); use a `Writer` (such as `PrintWriter` as Daniel suggests) will be better and easier than a binary stream (i.e. your `FileOutputStream`).
Kevin Brock
As can be seen in the edit, the code is reworked, but I suspect there's still place for improvement. Btw, the zip class used ishttp://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipOutputStream.html
Andrija
+2  A: 

You are assembling the complete file in memory: what you should be doing is writing the data directly to the file.

Additionally, you might consider using a proper XML API rather than assembling XML as a text file. A short tutorial is available here.

Tomislav Nakic-Alfirevic
+1  A: 

I have never encountered this usecase but I am pretty sure vtd-xml supports xml's of size more than 1 GB. It is worth checking out @ http://vtd-xml.sourceforge.net

Or you can also follow all the below article series @ http://www.ibm.com/developerworks/ "Output large XML documents"

Pangea
A: 

Ok, so the code is rewritten and I'll include the whole operation:

//this is the calling/writing function; I have 8 types of "proizvod" which makes 
//8 XML files. After an XML file is created, it needs to be zipped by a custom zip class
       generateXML(tmpParam,queryRBR,proizvod.getOznaka());
   writeToZip(proizvod.getOznaka());



//inside writeToZip

    ZipEntry ze = new ZipEntry(oznaka + ".xml");
    FileOutputStream fos = new FileOutputStream(new File(zipFolder + oznaka + ".zip"));
    ZipOutputStream zos = new ZipOutputStream(fos);
    zos.putNextEntry(ze);
    FileInputStream fis = new FileInputStream(new File(zipFolder + oznaka + ".xml"));
    final byte[] buffer = new byte[1024];
    int n;
    while ((n = fis.read(buffer)) != -1)
        zos.write(buffer, 0, n);
    zos.closeEntry();
    zos.flush();
    zos.close();
    fis.close();

// inside generateXML
PrintWriter writer = new PrintWriter(new BufferedOutputStream(new FileOutputStream(zipFolder +oznaka + ".xml")));
        writer.print("\n<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
        writer.print("\n<PROSTORNE_JEDINICE>");
        stmt = cm.getConnection().createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, 
                ResultSet.CONCUR_READ_ONLY);
        String q = "";
        rs = stmt.executeQuery(q);
        if(rs != null){

            System.out.println("Početak u : " +Util.nowTime());
            while(rs.next()){
                writer.print("\n\t<row>");
                writer.print("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>");
                writer.print("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>");
              //etc
              writer.print("\n\t</row>");
            }
            System.out.println("Kraj u : " +Util.nowTime());
        }
        writer.print("\n</PROSTORNE_JEDINICE>");

But generateXML part still takes a lot of memory (if I'm guessing correctly, it takes bit by bit as much as it can) and I don't see how I could optimize it (use an alternative way to feed the writer.print function)?

Andrija