views:

206

answers:

5

I have a data processing system that generates very large reports on the data it processes. By "large" I mean that a "small" execution of this system produces about 30 MB of reporting data when dumped into a CSV file and a large dataset is about 130-150 MB (I'm sure someone out there has a bigger idea of "large" but that's not the point... ;)

Excel has the ideal interface for the report consumers in the form of its Data Lists: users can filter and segment the data on-the-fly to see the specific details that they are interested in (because they're not really interested in the many thousands of rows, they know how to apply multiple filters to get the data they want) - they can also add notes and markup to the reports, create charts, graphs, etc... They know how to do all this and it's much easier to let them do it if we just give them the data.

Excel was great for the small test datasets, but it cannot handle these large ones. Does anyone know of a tool that can provide a similar interface as Excel data lists - the ability to dynamically create and change filters on multiple fields; but can handle much larger files?

The next tool I tried was MS Access, and found that the Access file bloats hugely (30 MB input file leads to about 70 MB Access file, and when I open the file, run a report and close it the file's at 120-150 MB!), the import process is slow and very manual (currently, the CSV files are created by the same plsql script that runs the main process so there's next to no intervention on my part). I also tried an Access database with linked tables to the database tables that store the report data and that was many times slower (for some reason, sqlplus could query and generate the report file in a minute or soe while Access would take anywhere from 2-5 minutes for the same data)

(If it helps, the data processing system is written in PL/SQL and runs on Oracle 10g.)

A: 

What version of Excel, it can handle pretty large volumnes of data now. In Excel 2007, the worksheet size is 16,384 columns by 1,048,576 rows. Are you really sending more than a million records in a report? WHy, who would look at data like that?

HLGEM
It's Excel 2003. The raw data for the reports certainly could contain more than 1,048,576 rows. Already, we break the data down into several CSV reports based on the type of data being reported on. This works to a degree, but there is a detailed report which still contains many many rows. We haven't yet hit a dataset that is as big as the REAL one and we're already hitting limits. You're right, no one will actually sit and read 1000000 rows, but the users like Excel data lists because they can apply several filters at a time and narrow down from many many many rows to a couple hundred that...
FrustratedWithFormsDesigner
...they are concerned with. The flexibility of the Data Lists is what I'm looking for because I just can't anticipate exactly how the testers and business users will want their data segmented. Even they don't always know, which is why everyone likes Data Lists.
FrustratedWithFormsDesigner
+1  A: 

Access would be a good tool to use in this case as it has no practical row limit unlike excel. The hard part is weaning people off excel when they are used to the power of custom filters. It is very possible in access to get something that approximates this but its never going to be exactly the same unless you embed an excel control into your forms.

As for the manual part, you can script the database to import files in using VBA. For example lets say this main task of your dumps the files in overnight to a folder with a new file each night. You could make a “watchdog” access database that has a form open with an “OnTimer” event that looks at that folder every few minutes, when it finds a new file it starts the import. When your users get to work in the morning the data is already loaded.

As for the bloating, yes it can be a problem however all you need to do is a quick compact and repair on the file and it will shrink it down.

EDIT:

You can set an access db to be compacted on close through the options. I cant remember exactly where it is and at work we only have access 97 (but oddly enough office 2003). The other option is to compact through code. Here is a link to explain how

http://forums.devarticles.com/microsoft-access-development-49/compact-database-via-vba-24958.html

Kevin Ross
The compact/repair trick works, but I don't want to have to do this every time I am done with the file. Can it be automated to run when the file is closed? I've used Access before but for some reason, this Acecss database seems to bloat much more easily than others.
FrustratedWithFormsDesigner
See my edit for some ideas on compacting the DB
Kevin Ross
@Kevin: Thanks, I didn't know about that!
FrustratedWithFormsDesigner
After meeting with some other members of the team, Access has come out as the winner.
FrustratedWithFormsDesigner
A: 

I would suggest you use an Excel front-end to a shared RDB backend.

Build a custom filtering system for Excel (I would use a VBA & ADO XLA parked on a server but there are several technologies available) that the End-Users drive and which generates the SQL to bring back to Excel the subset of data they want to play with (chart, calculate, print etc).

Charles Williams
+1  A: 

Interesting one; there's not a lot in the mid-range for this kind of problem. Access should be it, but as you've found out, is pretty terrible in several ways, and is probably a bit too advanced for many end-users.

On the other hand, if you've got a database server there, it seems a shame not to use its power. There are several tools of varying cost and complexity that will allow you to set up fairly user-friendly server-based reports where you can give users the ability to set up parameters for their own reports, that then get filtered on the server and can have their results exported to Excel, such as Oracle Discoverer, or Microsoft Reporting Services (which can be set up to report directly on Oracle databases, even though it's SQL Server based.)

We use Microsoft Reporting Services; I build reports in Developer Studio that allow the users to go to a web page, filter by any of several pre-defined criteria, run the report (with the hard work being done on the server) and export the results to Excel for further processing.

A halfway house where you set up reports that can filter the initial data down to fewer than a million rows and then export to Excel might be the way to go...

Matt Gibson
It sounds like you've dealt with this before ;) We don't have Discoverer and I don't think I'll be able to get approval for something like that - and if I somehow do, it would not be in time for when I need it. Is there anything in the open source world that you'd recommend?
FrustratedWithFormsDesigner
Sorry, no; I've only used commercial offerings for this kind of reporting. I've never had to investigate the open source world (currently I work for a non-profit/charitable organisation who get such good discounts on the Microsoft licensing that it doesn't make much sense to look at anything other than Reporting Services!)
Matt Gibson
A: 

I had the same problem about 5 years ago and tested about 6 different reporting tools. The most compact in working - FastReport. It was for Delphi that time. Now I also use FastReport for my .Net applications and I see - it really still smallest and fast.

Merl