views:

1452

answers:

8

I have an MS-Access application that formats the data in two large spreadsheets (20,000+ rows), imports this data into tables, runs a few queries comparing the data and outputs the results as excel files.

The problem is that as the application (and VBA code) grows it's becoming more of a pain using Access and I'm wondering if there is a better approach?

What would be the advantages/disadvantages of a .NET(C#)solution say, compared to MS-Access, and what would be the best libraries etc. to use?

Cheers,

Breandán

Cheers for the responses so far, I forgot to mention though that this application needs to be stand alone, I need to be able to package up the app and and send it to the end user to install on their computer. This only has (of note) MS-Office and .Net Framework installed, so I'm not sure how feasible MySQL etc. would be with no where to host it.

+1  A: 

The Apache POI library may be of use to you. It's java based and can work on excel files.

Another choice may be to use a direct ODBC driver to work on the XLS file. Might be a lot of records though..

http://poi.apache.org/

The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format, and Office OpenXML format, using pure Java. In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. POI is your Java Excel solution (for Excel 97-2007). However, we have a complete API for porting other OLE 2 Compound Document formats and welcome others to participate.

OLE 2 Compound Document Format based files include most Microsoft Office files such as XLS and DOC as well as MFC serialization API based file formats.

Office OpenXML Format based files include the new (2007+) xml based file formats, including Microsoft office files such as XLSX, DOCX and PPTX.

Edit: Check out a tool like Monarch Pro, it is superb at data mining, etc.

Jas Panesar
A: 

I wonder if you've outgrown Access. This is what more robust databases, like SQL Server, are born for. 20,000 rows isn't a large number for SQL Server, but I'll bet it's getting there for Access. How much disk space is Access taking up when you run it? How fast is the data growing, if at all?

How well do you know C# versus VB? If you don't know C# well, it could be a good opportunity to learn. It's a problem that you already know well.

duffymo
Can't people read? The original question says it's the CODE that's become unwieldy. 20K rows is a completely paltry amount of data for Jet to handle.
David-W-Fenton
Beside, 20000 row is nothing to Access.You can query 100k rows with ease.If always baffles me how little people understand the real limitations of Access vs the imagined ones.
Renaud Bompuis
If Access were really an enterprise database, there would have been no reason for Microsoft to "collaborate" with Sybase so they could come up with SQL Server. I'm not convinced that Access is the best solution for multiple simultaneous users, by any stretch.
duffymo
This is like saying that if SQLite was a database there would be no need for MySQL.Different things for different purposes.We're talking about a small dataset here, any small database systems is OK with a few 100k rows.Beside, nobody said Access was an Enterprise DB, it's just a workgroup one.
Renaud Bompuis
Unless SQLLite and MySQL are developed by the same people, I don't believe the analogy holds. Access and SQL Server are both from the same vendor. Microsoft doesn't put them in the same category. That's my point. If you need enterprise, it's SQL Server.
duffymo
A: 

I would consider a system that used SQL Server Compact to start with. You can use it with no install if you wish to (though I suggest you do, it's much easier to just assume the requisite dll is in the GAC). If subsequently you find you need more functionality (say stored procedures or TEXT/BLOB columns, that sort of thing) then moving to SQL Server will be trivial.

If your database is regenerated every time then trying different things out is a simple as pointing to a different file.

If you have considerable code in VBA I would suggest first moving it to VB.Net (making this, first and most error prone transition easier) then migrating that to c# if that is your desire.

As to generating/altering the excel sheet you have three options.

One is to use VSTO though Wikipedia is a much better description. This will be closest to your old VBA development though it has a whole different set of complexities.

Another is to use one of the excel output libraries, there are a few plus many more commercial ones. I couldn't recommend any personally though other's here have.

The last is just dump out csv and let excel deal with it when you open it.

ShuggyCoUk
Yet another person with no reading comprehension. The problem is CODE, not data management.
David-W-Fenton
FFS - most of my post was devoted to code, or did you not bother to read past SQL server compact which is to make the code easier (and given his edit would make his deployment easier too!!)
ShuggyCoUk
Ah, I see. You're pissed off in since your job is access solutions and you dislike people knocking it. Read MY post. do you see me knocking access? He is interested in moving to c#, is there a linq to Jet provider (barring this guys work http://www.codeproject.com/KB/linq/linqToSql_7.aspx)?
ShuggyCoUk
+2  A: 

I'd say for the data volumes of 20,000 rows you're working with, a SQL server database isn't really going to gain you much except for moving to stored procedures for data manipulation. In this respect, it's arguably better than VBA, so you will probably get a code base that's more maintainable. However, the data volumes you describe are tiny by database standards. I wouldn't expect performance to be an issue until you have a one or two orders of magnitude more data than that.

If you want to do a data munging job, you might be better off with a scripting language like Perl or Python. These languages are much better for data manipulation tasks than C# or VB.Net. Good, free windows distributions of both Perl and Python can be found at www.activestate.com.

Excel can be scripted with Python through the python-com interface using the same API as VBA, but gaining a much better language with a huge variety of libraies available. Similarly, this can also be done with Perl through Win32::OLE. There are also some utility libraries such as pyexcelerator, xlrd, and xlwt, for Python and Spreadsheet::WriteExcel/Spreadsheet::ParseExcel for Perl. There are also modules available for building installable windows applications such as Py2EXE or Perl Dev Kit

This Stackoverflow posting discusses using Excel from Python, including generating a wrapper with MakePy, in a bit more depth.

It you're working with .Net, you could also try IronPython - it's a native .Net implementation of Python, which will run anywhere with a suitable .Net runtime installed. You can get also a free Visual Studio plugin called IronPythonStudio).

Another alternative is R. R is primarily a statistical package, but the core language has strong data manipulation capabilities and a variety of interface libraries (and others such as graphics, various statistical computations and an Excel interface.. It's actually quite a powerful general purpose data manipulation and reporting tool.

ConcernedOfTunbridgeWells
+4  A: 

Moving to .Net would allow you to have better tools at your disposal to manipulate the data.

You have to be careful though about what exactly you are doing at the moment with your Access solution: if you're doing a lot of special case handling for processing the data from and to Excel, then chances are you'll still have to do those in whatever language or framework you chose.

If you have a lot of code invested into pulling the Excel Data into Access then you could still keep Access for that part and use .Net to help you in doing the comparisons and creation of the resulting Excel report.

It's a bit hard to really make a recommendation without knowing more about your project.

If you just want to use automation to pull-in data and create your Excel file, then .Net may not offer you a lot as you'll still have to do the exact same things you've already done in Access.

Instead, you may consider using commercial Excel component that use different paradigm to open/create Excel spreadsheet in a nicer way.
There are a few component vendors that have these.

One solution is also to use reporting tools to directly pull the data from Excel and produce a report that you can simply save back to Excel.

My advice would be:

  • If your Access solution is stable and it's doing its job, then you may consider keeping it.
    Moving to a new system will cost you time and money and you have to check if the outcome is worth the investment.
  • If you feel too constrained by the capabilities of Access, then spend some time experimenting with various solutions and components that allow you to manipulate Excel, for instance using a LINQ Excel provider (1 or 2) may provide a nice abstraction, or try various commercial components until you find one that matches your needs.

If you're going the .Net route, you may end-up not even needing a database for processing the data.
If you do though, you can always use Jet -or its new version, ACE- as a back-end that will create MSAccess databases.
It's already installed on most machines and well supported by MS tools.
Other good options are SQL Server Compact and SQLite as none of these require complex setup, just a DLL to ship with your project.

Renaud Bompuis
A: 

It sounds to me that Access isn't helping you as an intermediary. 20K rows doesn't sound like much to me. You said you go from Excel to Access and then back to Excel. Do you need to keep the Access data around?

If not, I'd suggest going with .Net and a third-party Excel library like FlexCel and go straight from Excel to Excel in memory. Packaging .Net applications is easy with either an installer or ClickOnce.

matt eisenberg
+2  A: 

At SpreadsheetGear, our specialty is high performance Excel compatible spreadsheet components. We hear all the time (see a few customer quotes on this page) that SpreadsheetGear for .NET is faster and easier to use than other options.

There are live ASP.NET samples with C# and VB.NET source and a free trial if you want to try it for yourself.

Disclaimer: I own SpreadsheetGear LLC

Joe Erickson
Joe - you rock!!
David Robbins
+1  A: 

SpreadsheetGear totally rocks! You get the best of both worlds, where all the logic in spreadsheet is available for you to execute from your code. You can really shorten your development, as you can have the business units help shape the logic for their business processes.

David Robbins