tags:

views:

142

answers:

2

As part of an ecommerce system I need to design and implement a blackbox of sorts. This needs to accept customers and orders from various internal and external data sources (i.e., web site, extranet, yahoo store, amazon xml feeds, etc.) and insert/update a backend system. The api's to insert/update customer and order data are in place for the web site and working well. Now we need to add the ability to process orders from other data sources.

I've been leaning towards the provider model (one provider for each data source) and using it to standardize data into sql server tables before calling the api's to actually add customers and place orders. Are there other models or patterns that I should consider? Have you dealt with this issue before and how did you solve it? Are there any resources (articles, books, projects, etc) I should look at?

+1  A: 

You may find an ETL (Extract-Transform-Load) tool will make your life easier than trying to solve your problem in code.:

These are designed specifically for the type of data loading work that you described.

EDIT

While I still maintain that an ETL tool will best serve your needs, if you insist on doing it in code, you should think about implementing ETL as a pattern. The reason for this is that ETL is a well established best practice for loading data from various sources. You should take some time to study how ETL is implemented.

At a basic level, you should have three layers, an extraction layer, a transformation layer, and a loading layer.

The extraction layer should be responsible for retrieving the data from the source. It should not worry about the shape of the data at this points. To keep the layer clean, you should only implement code that "gets" the data here. Worry about shaping it in the transformation layer.

The transformation layer should be responsible for taking data extracted from various sources and transforming it to the destination's shape. ETL tools do this very efficiently by treating the data as pipelines. These can be split and parallelized. You probably won't have the time or resources to do this. An alternative may be to load the data into staging tables (a less normalized representation of the data).

The load layer takes the transformed data (in the above case from the staging tables) and loads them in to the final destination location.

This sufficiently separates your layers so that you can protect yourself from future change. Keep in mind, however, that you're really just doing what an ETL tool will do for you out of the box.

Michael Meadows
I've looked at SSIS as I'm in the MSSQL Server environment but found problems with that approach:http://ayende.com/Blog/archive/2007/07/15/SSIS-15-Faults.aspxhttp://ayende.com/Blog/archive/2008/01/15/SSIS-vs.-CSV.aspxI'm really looking for a code solution although I will consider alternatives.
Geri Langlois
SSIS and ODI are technology specific. I agree that SSIS suffers from Microsoft's pre-version-3 problem; although it's imperfect, it is useful. Informatica is not technology specific, and it is very good, although expensive. Updated answer to include a starting point for a code solution.
Michael Meadows
A: 

As mentioned, ETL is probably the way to go, unless you can standardize the way the orders come in at the front, like EDI or web services.

Use the ETL tools to transform from the sources (email, FTP files, etc) into a common pipeline in your SQL Server and then have a separate process which processes the pending batches in the pipeline.

Cade Roux