views:

211

answers:

7

I am trying to retrieve the data from the table and convert each row into CSV format like

s12, james, 24, 1232, Salaried

The below code does the job, but takes a long time, with tables of rows exceeding 1,00,000.

Please advise on optimizing technique:

 while(rset1.next()!=false) {
                         sr=sr+"\n";
                        for(int j=1;j<=rsMetaData.getColumnCount();j++)
                        {
                            if(j< 5)
                            {
                         sr=sr+rset1.getString(j).toString()+",";
                            }
                            else
                          sr=sr+rset1.getString(j).toString();
                        }

                       }

/SR

+5  A: 

I'm no Java expert, but I think it's always bad practice to use something like getColumnCount() in a conditional check. This is because after each loop, it runs that function to see what the column count is, instead of just referencing a static number. Instead, set a variable equal to that number and use the variable to compare against j.

treeface
ok.. i will try and let you know.
@user414977: A decent compiler should be able to inline .getColumnCount() and reference the variable behind the function call directly. Additionally, the time needed to refer to the variable once per loop is far defeated by the time you wasted on copying string over and over, instead of using StringBuilder.
Lie Ryan
What he said. If I were you, I'd accept the answer from Steven Schlansker. I'm not even a Java programmer.
treeface
+1  A: 

You might want to use a StringBuilder to build the string, that's much more efficient when you're doing a lot of concatenation. Also if you have that much data, you might want to consider writing it directly to wherever you're going to put it instead of building it in memory at first, if that's a file or a socket, for example.

Matti Virkkunen
Will try your option. The basic requirement is that its should not be written on to a file.
+15  A: 

Two approaches, in order of preference:

Stream the output

PrintWriter csvOut = ... // Construct a write from an outputstream, say to a file
while (rs.next())
    csvOut.println(...) // Write a single line

(note that you should ensure that your Writer / OutputStream is buffered, although many are by default)

Use a StringBuilder

StringBuilder sb = new StringBuilder();
while (rs.next())
    sb.append(...) // Write a single line

The idea here is that appending Strings in a loop is a bad idea. Imagine that you have a string. In Java, Strings are immutable. That means that to append to a string you have to copy the entire string and then write more to the end. Since you are appending things a little bit at a time, you will have many many copies of the string which aren't really useful.

If you're writing to a File, it's most efficient just to write directly out with a stream or a Writer. Otherwise you can use the StringBuilder which is tuned to be much more efficient for appending many small strings together.

Steven Schlansker
No, file IO is SLOW, buffering the writes would perform much faster.
arthurprs
@arthurprs: unless you are using a raw OutputStream, nearly all IO operations in Java are already buffered. If you fear this, you can always use a BufferedOutputStream or a BufferedWriter. It's generally considered much faster to stream out data than build a huge intermediate value and then copy it out.
Steven Schlansker
It's possible the result String/StringBuilder is getting so large that the JVM is running out of heap space, leading to overactive garbage collection. Maybe try java -Xms512m -Xmx512m
gregcase
@gregcase: if you're building a half-gig String object you **really** should try for a streaming approach instead.
Steven Schlansker
@Steven Schlansker: In this case you are right, sorry, I don't know much java. But you may want to edit your answer.
arthurprs
@arthurprs: I already did a few minutes ago :)
Steven Schlansker
@Steven Schlansker I agree, just providing some feedback that this might be a situation where the problem isn't unoptimized code, but rather the limitations of the approach
gregcase
3rd approach, can be combinied: instruct JDBC to return result row-by-row. It by default hauls the entire resultset into Java's memory first before giving anything to `next()`. How to do that depends on JDBC driver used.
BalusC
@BalusC: that should really be a different answer. Agreed, though not row-by-row but in row batches! :)
Steven Schlansker
+1 cuz this should have been the accepted answer.
Sayed Ibrahim Hashimi
+1  A: 
StringBuilder sr = new StringBuilder();
int columnCount =rsMetaData.getColumnCount();
while (rset1.next()) {
    sr.append('\n');
    for (int j = 1; j <= columnCount; j++) {
        sr.append(rset1.getString(j));
        if (j < 5) {
        sr.append(',');
        }
    }
}
Skip Head
I think this is as good of a solution you will get, although I'm not sure the changes will actually show a noticeable performance improvement. The only change I would make is changing the while to while(rset1.next())
gregcase
@gregcase: Thanks! I didn't notice the '!= false' part. I have edited and fixed it.
Skip Head
Also, I imagine the 'j < 5' should be 'j < columnCount', but it's impossible to tell for sure.
Skip Head
+1  A: 

I don't believe minor code changes are going to make a substantive difference. I'd surely use a StringBuffer however.

He's going to be reading a million rows over a wire, assuming his database is on a separate machine. First, if performance is unacceptable, I'd run that code on the database server and clip the network out of the equation. If it's the sort of code that gets run once a week as a batch job that may be ok.

Now, what are you going to do with the StringBuffer or String once it is fully loaded from the database? We're looking at a String that could be 50 Mbyte long.


This should be 1 iota faster since it removes the unneeded (i<5) check.

StringBuilder sr = new StringBuilder();
int columnCount =rsMetaData.getColumnCount();
while (rset1.next()) {
    for (int j = 1; j < columnCount; j++) {
        sr.append(rset1.getString(j)).append(",");
        }
    // I suspect the 'if (j<5)' really meant, "if we aren't on the last
    // column then tack on a comma." So we always tack it on above and
    // write the last column and a newline now.
    sr.append(rset1.getString(columnCount)).append("\n");
    }
}

Another answer is to change the select so it returns a comma-sep string. Then we read the single-column result and append it to the StringBuffer.

I forget the syntax now, but something like:

select column1 || "," || column2 || "," ... from table;

Now we don't need to loop and comma concatenation business.

StringBuilder sr = new StringBuilder();
while (rset1.next()) {
    sr.append(rset1.getString(1)).append("\n");
    }
}
Tony Ennis
I think you mean `select column1 || ',' || column2 || ',' ...`
Steven Schlansker
Yes, thanks for the syntax assist. I don't have a DB running here at home to try it.
Tony Ennis
+1  A: 

As a completely different, but undoubtely the most optimal alternative, use the DB-provided export facilities. It's unclear which DB you're using, but as per your question history you seem to be doing a lot with Oracle. In this case, you can export a table into a CSV file using UTL_FILE.

See also:

BalusC
I agree using the data export module is the best way, because 1. it will handle any weird escaping 2. it will certainly execute quicker 3. you can dump the work on the DBAs :-D and move on to something else.
Tony Ennis
+1  A: 

As the other answers say, stop appending to a String. In Java, String objects are immutable, so each append must do a full copy of the string, turning this into an O(n^2) operation.

The other is big slowdown is fetch size. By default, the driver is likely to fetch one row at a time. Even if this takes 1ms, that limits you to a thousand rows per second. A remote database, even on the same network, will be much worse. Try calling setFetchSize(1000) on the Statement. Beware that setting the fetch size too big can cause out of memory errors with some database drivers.

David Phillips
With some databases (at least Postgres JDBC) it defaults to retrieving the entire set, not one row at a time. That said your suggestion is still very useful as for a lot of rows that can easily OOM.
Steven Schlansker