tags:

views:

110

answers:

2

I've got a Ruby + Merb webapp that needs to quickly estimate (or count exactly) how many row a spreadsheet has. It accepts every format that the roo library supports, including .xls, .xlsx, .ods, and text-based formats like CSV and TSV.

CSV/TSV is easy and fast:

`cat #{filepath} | dos2unix | wc -l`.to_i

However, using the roo library can be very slow for large files:

e = Excel.new(filepath)
e.last_row

My experience with Excel file formats is nil, so I'm appealing to the S.O. Masses: how might I try to estimate the number of rows an XLS, XLSX, or ODS file contains using only Ruby and/or standard UNIX programs? My goal is to be able to handle 5mb files in under 1.5 seconds (give or take on various hardware).

+1  A: 

Is this helpful? http://www.weheartcode.com/2007/10/05/reading-an-excel-file-with-ruby/

Jacob
If I understand correctly, roo actually uses Parseexcel under the hood.In either case, the problem is that reading in large Excel files (greater than a few hundred KB) takes way too much time, at least in Ruby. I imagine the solution will be to estimate the row count without properly reading and parsing the file -- perhaps by being able to grep for a row separator or identifier in the binary file format.
Tyson
Sorry then .. apart from this, the only thing I can think of is checking out the xls file format http://www.wotsit.org/
Jacob
+1  A: 

I'm working with spreadsheet gem, give it a shot.

khelll
roo uses spreadsheet internally.
Vijay Dev