views:

148

answers:

8

I generate files, lets call them .dwrf files, which contain a significant amount of data. Currently we export those to .CSV and the resulting files are large (2GB+). I would like to cut out the export process and make the contents of a .dwrf file queryable directly from Excel or other applications.

What I would like to do is write a utility/service - lets call it dwrfMiner - to extract data from the file and pass it on as a datasource and link dwrfMiner to .dwrf files in some way so that Excel recognises it as an external data source.

Any ideas?

+1  A: 

Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.

See my question: http://stackoverflow.com/questions/2775876/reporting-tool-viewer-for-large-datasets (and I had much less than > 2GB).

FrustratedWithFormsDesigner
Excel 2007 row limit is 1,048,576 rows: in prior versions, the limit was 65,536 rows... per worksheet
Mark Baker
A: 

Use SQLite.

mcandre
-1: This answer doesn't cover dwarFish's question at all. It *might* be one aspect of a solution but you don't even explain why it should, given that the binary file has its own format.
chiccodoro
A: 

I used PHP FlatFile DB to query flat-files in the past

anijhaw
A: 

Use MS Access and MS SQL tables.

mcandre
Use it how? He said he wants to make the files directly queryable because they're large, so he probably doesn't want to import the data into a different database.
Rup
-1: Same reason as for your other answer
chiccodoro
A: 

Maybe you can implement a function in a VBA that you can call it in Excel much like built-in functions.

Jahangir Zinedine
A: 

I'd get out gcc and write yourself a full ODBC driver for it. Then you can sit back and use SQL.

You know, if you're bored. ;)

LoveMeSomeCode
A: 

use odbc driver with multithreading

Vikash
+2  A: 

While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.

There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.

If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.

Failing that, your last option is the overkill option (Writing an ODBC driver).

Kilanash
(+1); To take this even further, for future software design you (dwarFish) might want to take such considerations into account before defining an output format.
chiccodoro