I've got a Ruby + Merb webapp that needs to quickly estimate (or count exactly) how many row a spreadsheet has. It accepts every format that the roo library supports, including .xls, .xlsx, .ods, and text-based formats like CSV and TSV.
CSV/TSV is easy and fast:
`cat #{filepath} | dos2unix | wc -l`.to_i
However, using the roo library can be very slow for large files:
e = Excel.new(filepath)
e.last_row
My experience with Excel file formats is nil, so I'm appealing to the S.O. Masses: how might I try to estimate the number of rows an XLS, XLSX, or ODS file contains using only Ruby and/or standard UNIX programs? My goal is to be able to handle 5mb files in under 1.5 seconds (give or take on various hardware).