views:

394

answers:

6

Hi,

My company is considering using web service as mean of ETL process. However I don't think web service fit into this purpose, for several reasons: 1. web service could possibly consume a lot of memory when generating large xml. 2. xml is a bloated format. 3. possibly time-out if the server takes huge amount of time to generate data 4. file size limitation? (for windows, it's 2Gb, if my memory serves me right)

I am not a web service expert, so I need your opinions. :)

Thanks.

+1  A: 

I would not use a web service for an ETL task. There are specialized tools for that task (e.g., Ab Initio, Informatica, etc.) that are better suited.

If you have a large amount of data, I'd say that the price of the extra latency that the network would introduce would be prohibitive.

duffymo
+1 use the right tool for the right job
Pascal Thivent
A: 

It really does depend on what you are doing and how you are trying to accomplish it. In general webservices require more care and feeding than you would normally put into an ETL process, but they can be surprisingly effective at the task as well. I did not get enough specifics for your scenario to say whether it would work.

I have worked on Webservices which transmit and recieve 100+ MB documents, some encoded in XML some not, and do it in seconds (on a closed local network). These services required a good deal of tuning and planning, but they did work well for our scenario and they allowed a wide variety of clients to connect and transmit differing amounts of data through a fairly standard interface. This differed from some of the other ETL jobs we had were the job was specific to each client and had to be setup and maintained for each client.

It all depends on what you are doing and what your constraints are.

If you are going to pursue this route sit down and draft out the process from beginning to end, including how you want clients to connect, verify that the data was received and verify that the job is finished. Consider some of the scenarios, the clients and the types of data being transmitted and then work out what would be needed. Contrast that with what is already available in other tools, and how much time you have to get it done.

GrayWizardx
+1  A: 

There are plenty of technologies in the Web Services tool shed that circumvent all the problems you elaborate. There is stream oriented XML shredding, there are XML compression formats for delivery, protocols that deal with fragmentation and fairness and there are many a storage systems that can hold terabytes upon terabytes of data.

If by web service you imagine some college freshmen homework concoction of an interface that accepts a single glop argument with a 2GB serialized table in it then all your arguments are valid. But if you give your requirements to an experienced team with knowledge of the concepts involved in WS-ReliableMessaging and WS-Transaction then there is no reason not to have an ETL process around Web Services. Note that I do not advocate the SOAP protocols per-se, but I do advocate knowledge and understanding of the concepts involved.

Now that being said, whether an Web Service oriented ETL process makes sense for you or not it depends on a whole set of other reasons. However, your rebuttal of the Web Service technologies does not hold water.

Remus Rusanu
A: 

Look up MTOM, to start with, which allows arbitrary non-XML data to be streamed in a web service.

bmargulies
A: 

Web services are just fine for ETL tasks. Remember that each task is going to get handled in its own thread for free, and you're guaranteed proper cleanup between requests. Using web services inside something like Tomcat wouldn't be nearly as heavy as you think.

If you're concerned over the bloat of XML, consider JSON format.

dj_segfault
A: 

I'm really wondering why your company is not considering using a real ETL tool like like those mentioned by duffymo in his answer or, Talend or CloverETL if open source is an option.

  1. They are in general good for ETL purpose :)
  2. Building your own solution sounds like reinventing the wheel.
  3. Many of them have web services oriented features (see Export a job as webservice in Talend's wiki or CloverETL Server HTTP Launch Services for example).

I'm not an ETL product expert and I didn't check them all but I'm pretty sure this is something to consider.

Pascal Thivent