views:

153

answers:

1

I have a design question on how to design parsing a large excel file say 1000 x 100 Rows, and about 10 tabs.

Each Tab having a set of records with primary key as the first col. but these could be repeted in different tabs and each does not have the same set of primary keys.

  1. Read a single primary key and form a java object with all the attributes spread across multiple tabs.

  2. Read tab by tab, and process the record

In terms of how Memory is handled in HSSF Java API.

Thanks in Advance

A: 

I have an app that does almost exactly your option 1. You'll need about 500M of ram for the VM for it to run at all well. And its not super fast, but it works.

I'd therefore go for option 2. ( refactoring to cache tabs parse results has improved performance.

I'd recommend to stop using HSSF objects as soon as you can so they can be garbage collected.

Tim Williscroft