views:

2869

answers:

8

I am looking to write to an excel (.xls MS Excel 2003 format) file programatically using Java. The excel output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit).

I have tried using the apache POI APIs but it seems to be a memory hog due to the API object model. I am forced to add cells/sheets to the workbook object in memory and only once all data is added, I can write the workbook to a file! Here is a sample of how the apache recommends i write excel files using their API:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

//Create a row and put some cells in it
Row row = sheet.createRow((short)0);

// Create a cell and put a value in it.
Cell cell = row.createCell(0);
cell.setCellValue(1);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space".

I have tried increasing JVM initial heapsize and max heap size using Xms and Xmx parameters as Xms512m and Xmx1024. Still cant write more than 150k rows to the file.

I am looking for a way to stream to an excel file instead of building the entire file in memory before writing it to disk which will hopefully save a lot of memory usage. Any alternative API or solutions would be appreciated, but I am restricted to usage of java. Thanks! :)

A: 

Hi,

Is this memory issue happen when you insert data into cell, or when you perform data computation/generation?

If you are going to load files into an excel that consist of predefined static template format, then better to save a template and reuse multiple time. Normally template cases happen when you are going to generate daily sales report or etc...

Else, every time you need to create new row, border, column etc from scratch.

So far, Apache POI is the only choice I found.

"Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space"."

"Enterprise IT"

What YOU CAN DO is- perform batch data insertion. Create a queuetask table, everytime after generate 1 page, rest for seconds, then continue second portion. If you are worry about the dynamic data changes during your queue task, you can first get the primary key into the excel (by hiding and lock the column from user view). First run will be insert primary key, then second queue run onwards will read out from notepad and do the task portion by portion.

i need help
Why are we talking about task queues? :-SI dont really understand what you are trying to say. When you say use templates, do you want me to use the jxls api or something of that sort?
Jass
+2  A: 

There also is JExcelApi, but its uses more memory. i think you should create .csv file and open it in excel. it allows you to pass a lot of data, but you wont be able to do any "excel magic".

01
Delimited files are of course awesomely light weight! But unfortunately this is not an option.No excel magic and no data formatting etc. I need to write .xls files. :(
Jass
I think you should try to convince your boss to use .csv, because later you will have a lot of problem with slow working excel and the whole webapp could not work, because 10 people are generating excel reports.
01
+2  A: 

All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want.

Wikipedia has a good article about the overall format.

Aaron Digulla
Thanks! I did think of xslx, but the clients use office 2003 and xslx becomes problematic. Also they cant install that MS plugin to convert xslx to xls. Two words - "Enterprise IT" :|
Jass
Create the file, open it in Excel and save it with the old format.
Aaron Digulla
Reports would be generated on a daily/weekly basis, opening in excel and saving with old format is not really feasible.
Jass
Ask an Excel buff to write you a macro for this. Put the macro into an Excel sheet and just double click the "convert sheet" to do the conversion. When that works, you can write a batch file to do it automatically.
Aaron Digulla
@Jass:You use Office 2003 but need to write 150k Rows? I thought only Office Versions from 2007 on are capable of using more than 65536 rows...http://en.wikipedia.org/wiki/Excel_2003
HerdplattenToni
@Toni I has stated that "output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit)."
Jass
A: 

We did something quite similar, same amount of data, and we had to switch to JExcelapi because POI is so heavy on resources. Try JexcelApi, you won't regret it when you have to manipulate big Excel-files!

fvu
Thanks, I ll do a Proof-Of-Concept with the JExcelAPI and see how it fares against POI. But the structure seems similar, so i dont really see how much of a difference it will make. Can you give me some ratios, numbers if possible?
Jass
As we ditched the POI based code quite some time ago for exactly the same problems as yours, sorry I can't. However, I just had a look at the Glassfish server that among other apps hosts the application that generates the .xls files. It has -Xmx768m, and we never had Excel generations that caused an oom error. I just generated 10 xls's of 45000 lines * 8 cols and there's 574,423,040 bytes on the heap. Hope this helps
fvu
Multiple xls wouldnt be a problem, cos once they are flushed to file, the object could be sent for garbage collection. So yeah one excel file of 45k rows x 8 cols would run under 768Mb but 150,000 x 20 Cols would require more than 1GB and I cant exponentially keep allocating memory, that would spell bad design. :D Anyway Thanks for all that! :)
Jass
+1  A: 

Have a look at the HSSF serializer from the cocoon project.

The HSSF serializer catches SAX events and creates a spreadsheet in the XLS format used by Microsoft Excel

pgras
A: 

Had you tried SmartXLS?It is a commercial component.To create huge data excel file,you must set your java xmx parameter to larger memory.

liya
+1  A: 

Consider using CSV format. This way you aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSET and immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.

That said, if the data is actually coming from a DB, then I would highly reconsider if Java is the right tool for this. Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILE command for this.

BalusC
+1  A: 

I had to split my files into several excel files in order to overcome the heap space exception. I figured that around 5k rows with 22 columns was about it, so I just made my logic so that every 5k row I would end the file, start a new one and just numerate the files accordingly.

In the cases where I had 20k + rows to be written I would have 4+ different files representing the data.

ChrisAD
yeah i had that idea too, multiple excel files and then maybe zip it into one archive. But it is not good enough.Anyways thanks for the suggestion.
Jass
If you ever come up with a solution I appriciate if you update your question =) This is a very annoying problem.
ChrisAD