views:

161

answers:

1

Good afternoon everybody,

I am in a bit over my head and facing some tight deadlines, so hopefully someone can offer some advice.

My starting point will be a table in a SQL Server database, two of whose fields are x,y coordinates obtained from a gps unit. I will be responsible for geocoding (getting physical street address) these locations, and writing the result to a success table or a failure to an errors table. I am wrapping up the class library (C#) that actually does the geocoding process, but am very confused on the best way to tie everything together.

Ideally, I would like to send a dataset/datatable from SQL Server to some type of service, where the all records would be geocoded, and the result/failure for each record would be written back into the appropriate table. Being that the database and dll will reside on the same server, I don't see why I would use a web service. So, the rest of my investigation points to either a windows service or SQL CLR Integration. My main questions are:

  1. Is one of these methods more appropriate then the others?
  2. Is this approach acceptable(in terms of best practice)?

Any tips,comments,suggestions would be greatly appreciated.

On a side note, if it is not obvious already, I am a relatively novice programmer. This site has been an invaluable tool in my growth as a programmer, and I really appreciate the individuals who take the time to read posts and offer their expert advice.

+2  A: 

Your geo-coding code will likely connect to a geo-coding service, and this rules out SQL CLR integration. Whicle technically possible, accesing external resources, specially web services, from inside SQL CLR is a the biggest mistake you can make. Over a period of few days, your server will be frozen in worker starvation because of CLR thread hijack, guaranteed.

The best approach is to use an ETL strategy. The coordinates are dumped into a staging table, you start your ETL process and transform all coordinates into geocodes, and write the results into the success and error tables. An ETL approach usually accounts for batching, intreruption and resuming and so on. The ETL can be a SQL job launching an application, a SSIS package, there are many ways to do it, the important bit is the separation of work into staging tables and the provisioning of suspend/resume semantics. The dll code you have right now (which I assume is a geocoding web service client proxy code) is probably 0.001% of your project.

Whether the process needs to be exposed as a web-service, is completely orthogonal to the ETL process and it should be driven by your client connecting requierements and technology. In other words if the client code preffer to upload the coordinates using WS, do it so, as long as you account for large datasets and as long as you provide decoupled, asynchronous and queued, semantics for results.

Remus Rusanu