views:

50

answers:

4

I have a web application in Spring that has a functional requirement for generating a CSV/Excel spreadsheet from a result set coming from a large Oracle database. The expected rows are in the 300,000 - 1,000,000 range. Time to process is not as large of an issue as keeping the application stable -- and right now, very large result sets cause it to run out of memory and crash.

In a normal situation like this, I would use pagination and have the UI display a limited number of results at a time. However, in this case I need to be able to produce the entire set in a single file, no matter how big it might be, for offline use.

I have isolated the issue to the ParameterizedRowMapper being used to convert the result set into objects, which is where I'm stuck.

What techniques might I be able to use to get this operation under control? Is pagination still an option?

+1  A: 

A simple answer:

Use a JDBC recordset (or something similar, with an appropriate array/fetch size) and write the data back a LOB, either temporary or back into the database.

Another choice:

Use PL/SQL in the database to write a file using UTL_FILE for your recordset in CSV format. As the file will be on the database server, not on the client, Use UTL_SMTP or JavaMail using Java Stored Procedures to mail the file. After all, I'd be surprised if someone was going to watch the hourglass turn over repeatedly waiting for a 1 million row recordset to be generated.

Adam Musch
Just to get clarification on your suggestion -- I control the application server (to a degree), but I do not have access to the database either through the filesystem or creating new tables. Is this still possible? JavaMail is an interesting idea, but I'd have to go through an administrative process to get that set up. It's something I'll keep in mind.
Jon
Bulk fetch N thousand, write, dispose of set, bulk fetch next N thousand, etc. You should be able to do that at the JDBC level
JulesLt
+1  A: 

Instead of loading an entire file in memory you can process each row individually and use output stream to send the output directly to the web browser. E.g. in servlets API, you can get the output stream from ServletResponse.getOutputStream() and then simply write result CSV lines to that stream.

Eugene Kuleshov
Bypassing the Spring View functionality, right? This is also an option, if needed. For now, I'm making sure the results are small enough by having a stricter query, but I can't guarantee that they won't grow over time.
Jon
Exactly. And this is actually supported well in Spring MVC, you just need to declare OutputSteream or Writer as one of your handler method parameters.
Eugene Kuleshov
A: 

I would push back on those requirements- they sound pretty artificial. What happens if your application fails, or the power goes out before the user looks at that data?

From your comment above, sounds like you know the answer- you need filesystem or oracle access, in order to do your job.

You are being asked to generate some data- something that is not repeatable by sql? If it were repeatable, you would just send pages of data back to the user at a time.

Since this report, I'm guessing, has something to do with the current state of your data, you need to store that result somewhere, if you can't stream it out to the user. I'd write a stored procedure in oracle- it's much faster not to send data back and forth across the network. If you have special tools or its just easier, sounds like there's nothing wrong with doing it on the java side instead.

Can you schedule this report to run once a week?

Brian Maltzan
I think, unfortunately, this is the answer. I'll have to push back, and possibly build in a check not to process a result that will be too big.
Jon
A: 

Have you considered the performance of an Excel spreadsheet with 1,000,000 rows?