views:

619

answers:

7

Hello all,

I have the following code that executes a query and writes it directly to a string buffer which then dumps it to a CSV file. I will need to write large amount of records (maximum to a million). This works for a million records it takes about half an hour for a file that is around 200mb! which seems to me like a lot of time, not sure if this is the best. Please recommend me better ways even if it includes using other jars/db connection utils.

....
eventNamePrepared = con.prepareStatement(gettingStats + 
    filterOptionsRowNum + filterOptions);
ResultSet rs = eventNamePrepared.executeQuery(); 
int i=0;
try{
......
FileWriter fstream = new FileWriter(realPath + 
    "performanceCollectorDumpAll.csv");
BufferedWriter out = new BufferedWriter(fstream);
StringBuffer partialCSV = new StringBuffer();


while (rs.next()) { 
  i++;
  if (current_appl_id_col_display) 
      partialCSV.append(rs.getString("current_appl_id") + ",");
  if (event_name_col_display) 
      partialCSV.append(rs.getString("event_name") + ",");
  if (generic_method_name_col_display) 
      partialCSV.append(rs.getString("generic_method_name") + ",");
  ..... // 23 more columns to be copied same way to buffer
  partialCSV.append(" \r\n");
  // Writing to file after 10000 records to prevent partialCSV 
  // from going too big and consuming lots of memory
  if (i % 10000 == 0){
      out.append(partialCSV);
      partialCSV = new StringBuffer();
  }
}      
con.close();
out.append(partialCSV);
out.close();

Thanks,

Tam

+5  A: 

Profiling is generally the only sure-fire way to know why something's slow. However, in this example I would suggest two things that are low-hanging fruit:

  1. Write directly to the buffered writer instead of creating your own buffering with the StringBuilder.
  2. Refer to the columns in the result-set by integer ordinal. Some drivers can be slow when resolving column names.
Steve Reed
Ack, I didn't notice (as Jared did below) that you're using a StringBuffer too (I assumed StringBuilder).
Steve Reed
Thanks Steve. Good suggestions. I will have to learn about profiling as I haven't done this before. However, I tried both your suggestions, they improved performance very slightly..like 1% or 2%. thanks
Tam
+5  A: 

Just write to the BufferedWriter directly instead of constructing the StringBuffer.

Also note that you should likely use StringBuilder instead of StringBuffer... StringBuffer has an internal lock, which is usually not necessary.

Jared Oberhaus
That is true, StringBuffer has a lock for synchronizing access to it if it is used by multiple threads. StringBuilder is faster.
Jon
@Steve Reed has some excellent advice: Profile first, then optimize.
Jared Oberhaus
MBCook's suggestion that you might be blocked on the database is a good one to investigate, this code above could be the fastest code known to man and it'll still take forever to write a CSV file if your database is slow in handing over the data.
Steve Reed
A: 

I have two quick thoughts. The first is, are you sure writing to disk is the problem? Could you actually be spending most of your time waiting on data from the DB?

The second is to try removing all the + ","s, and use more .appends for that. It may help considering how often you are doing those.

MBCook
+3  A: 

You could tweak various things, but for a real improvement I would try using the native tool of whatever database you are using to generate the file. If it is SQL Server, this would be bcp which can take a query string and generate the file directly. If you need to call it from Java you can spawn it as a process.

As way of an example, I have just run this...

bcp "select * from trading..bar_db" queryout bar_db.txt -c -t, -Uuser -Ppassword -Sserver

...this generated a 170MB file containing 2 million rows in 10 seconds.

Jon
This sounds like great idea! I'm using Oracle, I will research what Oracle has in that space.
Tam
A: 

You mentioned that you are using Oracle. You may want to investigate using the Oracle External Table feature or Oracle Data Pump depending on exactly what you are trying to do.

See http://www.orafaq.com/node/848 (Unloading data into an external file...)

Another option could be connecting by sqlplus and running "spool " prior to the query.

Plasmer
+1  A: 

I just wanted to add a sample code for the suggestion of Jared Oberhaus:

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class CSVExport {
    public static void main(String[] args) throws Exception {
    String table = "CUSTOMER";
    int batch = 100;

    Class.forName("oracle.jdbc.driver.OracleDriver");
    Connection conn = DriverManager.getConnection(
        "jdbc:oracle:thin:@server:orcl", "user", "pass");
    PreparedStatement pstmt = conn.prepareStatement(
        "SELECT /*+FIRST_ROWS(" + batch + ") */ * FROM " + table);
    ResultSet rs = pstmt.executeQuery();
    rs.setFetchSize(batch);
    ResultSetMetaData rsm = rs.getMetaData();
    File output = new File("result.csv");
    PrintWriter out = new PrintWriter(new BufferedWriter(
        new OutputStreamWriter(
        new FileOutputStream(output), "UTF-8")), false);
    Set<String> columns = new HashSet<String>(
        Arrays.asList("COL1", "COL3", "COL5")
    );
    while (rs.next()) {
        int k = 0;
        for (int i = 1; i <= rsm.getColumnCount(); i++) {
        if (columns.contains(rsm.getColumnName(i).toUpperCase())) {
            if (k > 0) {
                out.print(",");
            }
            String s = rs.getString(i);
            out.print("\"");
            out.print(s != null ? s.replaceAll("\"", "\\\"") : "");
            out.print("\"");
            k++;
        }
        }
        out.println();
    }
    out.flush();
    out.close();
    rs.close();
    pstmt.close();
    conn.close();
    }
}
kd304
A: 

Writing to a buffered writer is normally fast "enough". If it isn't for you, then something else is slowing it down.

The easiest way to profile it is to use jvisualvm available in the latest JDK.

Thorbjørn Ravn Andersen