views:

585

answers:

6

I have a VB.NET windows application that pulls information from an MS Access database. The primary role of the application is to extract information from Excel files in various formats, standarize the file layout and write that out to csv files. The application uses MS Access as the source for the keys and cross reference files.

The windows app uses typed datasets for much of the user interaction between the database. The standardization is done on the on each clients machine. The application is not... how can I say this...FAST :-).

Question: What is the best way to migrate the DB and application to SQL Server 2005. I am thinking it might be a good idea to write the code for the standarization in and SSIS packages.

What is the appropriate way to go about this migration?

+2  A: 

Microsoft provide a free tool to migrate an Access Database to SQL Server. Once you've upgraded you should be able to change your connection string to point at SQL Server.

Mitch Wheat
You mean "to migrate a JET database to SQL Server." If your MDB has Access objects in it (forms/reports/etc.) then those will *not* be migrated to SQL Server, for obvious reasons.
David-W-Fenton
+1  A: 

You might want to run your app through a profiler to ensure that the Access DB is really what's slowing down your app, and not something else. It would be a shame to go through all the work to convert it over to SQL server, and have nothing to show for it.

Kibbee
+1  A: 

The Access upsizing wizard can be used as a starting point.

You may be able to change the backend to be SQL Server with linked tables in Access without changing your front end. Then, you can modify the front end to go directly to SQL Server at will.

Unless you are hitting Access very heavily, I doubt that it is your bottleneck.

As far as reading the Excel files, SSIS can do it, but it might not be as reliable as the mechanism you are using in VB.NET right now, if your VB.NET code has a lot of smart logic to deal with a degree of variation in the input files

As far as writing data out to CSV, SSIS is fine, and I've found SSIS to be a pretty good performer.

If you could give more details about the workflow and how much the user interacts with the database versus the program pulling configuration, it might be easier to help with your architecture.

SSIS is very configurable on the fly (package configuring itself somewhat while it is running), and in many cases it could be programmed to read a variety of Excel files and convert them to CSV, but it's not as configurable on the fly as a hand-coded system. It is also possible to use the SSIS object model to generate packages programmatically and then execute them - this does not have some of the limitations of a package configuring itself, but the object model is pretty complex.

Cade Roux
A: 

Making sure the scope is clear:

  1. Use a .NET program to
  2. drive an Access database front-end which enables you to
  3. Extract data from a number of Excel spreadsheets,
  4. Massaging the data appropriately, and
  5. Save the result in a CSV file.

What sorts of volumes are we talking about? How many clients, how many spreadsheets per client, how many rows per spreadsheet (I think it would be 32767 max for a single spreadsheet, right? And how much time are we talking about?

Seems like a lot of moving parts. And Access usually is a pretty good tool (with VBA) to do this sort of thing by itself.

It doesn't seem like enough volume to provide a major time sink for a well-designed Access database front-ending Excel to accomplish the whole process using VBA. If your alternative involves installing and operating SQL Server (in place of Access) on each client, I would be surprised if the admin and operational overhead doesn't increase.

le dorfier
Access is also an excellent tool to use to move data between disparate data sources, such as SQL Server and Excel.
David-W-Fenton
A: 

The application pulls data from 250 excel files each week and approximatley 800 files each month with an average of about 5000 rows per file. There are 13 different file formats that are standarized and out put into 3 different standard formats. The application takes between 25 min. and 40 min to run depending on which data run we are taling about. 95% of the appliction is the standarization process. All the user does is pick a few parameters then start the run.

Kalel
A: 

So Weekly, per client: 250 files at 25 minutes = 10 files / minute or 6 seconds per file.

Monthly, per client: 800 files at 40 minutes = 20 files/minute or 3 seconds per file.

My expectation would be less than 1 sec. per file (5000 rows) round trip including:
a. Import or attach xls to mdb,
b. Transform via Access SQL
c. Export to csv

The only explanation that comes to mind is that perhaps the .NET app is reading, transforming, and saving a row at a time. Is that possibly the case?

If you convert to SSIS, then that probably obsoletes the .NET app, because SSIS will want to handle the ETL (and save) itself. So you will basically be rewriting the software. But you may have better resources for SSIS than for Access. But it seems to me like overkill. BUt then .NET rather than VBA also is maybe overkill; and rewriting in VBA is work, too. The least effort would I think be to see if you can do the entire ETL (and save) using Access SQL for most of it, and using VBA just for scripting, to iterate through input files in a directory or some such.

I think you could at least prototype the basic use cases and find out if you can find out pretty quickly where the time is being spent now (as suggested by earlier responses.) But that would be worth finding out before committing redevelopment resources aimed at the wrong part of the problem. If you can expand a bit in those areas, I could probably direct you further. But Access is pretty well suited for this sort of thing, at (IMHO) a lower TCO than SQL Server + SSIS + .NET.

Not to mention that I'd be surprised if the csv files are the true end point, which may play a role in the decision. Isn't the Excel data really ending up further down the path?

Finally, how objectionable is a 25-40 minute process that presumably is unattended, can run over lunch break, and maybe basically works ok?


Notes:

Per week    

Excel Files 250
Minutes 25
Minutes/File    0.1
Sec/File    6


Per month   

Excel files 800
Minutes 40
Minutes/File    0.05
Sec/File    3
le dorfier