What are good problems to solve using CLR stored procs?

views:

275

answers:

+6 Q:

What are good problems to solve using CLR stored procs?

I have used CLR stores procedures in SQL server for awhile now, but I'm still wondering what the best situations to use them are.

MSDN provides some guidelines for use like heavy string manipulation (regex), or replacing T-SQL that declares lots of table variables and cursors. I'm curious to know what problems SO users are solving with CLR stored procs, and examples / benchmarks as well.

For example, I have found CLR stored procs + SSRS to be a great way to get data manipulation logic out of SSRS and out of T-SQL, and into managed code that is easier to read and manipulate.

+4 A:

String manipulation - regular expression search is a classic. Very easy to expose in CLR, very difficult to do in straight T-SQL.

See this link for details of implementation and a micro-benchmark (SQLCLR is only 47 milliseconds compared to 6.187 seconds for the T-SQL UDF).

Oded 2010-01-26 17:03:28

+4 A:

String manipulation (regexes) was already mentioned, but also DateTime arithmetic, and of course another biggie - calling external web services.

marc_s 2010-01-26 17:06:07

Hey Marc, can you give an example of a problem you would solve by access to external webservices in a CLR proc? +1 for usefulness for sure.

Tj Kellie 2010-01-26 17:59:22

Lots of things - e.g. getting the up-to-date currency exchange rates, getting up-to-date stock information and lots more

marc_s 2010-01-26 18:09:02

+1 A:

Custom aggregates
String manipulation
Custom data types

To be honest, I only see string handling which includes splitting CSVs into rows.

I'd consider anything that needs more then the default trust level to be out of bounds, unless I was a DBA doing DBA type stuff.

From MSDN with RegEx and RSS feed examples: Using CLR Integration in SQL Server 2005

gbn 2010-01-26 18:32:22

+10 A:

Many problems requiring denormalization and/or sequential operations can be handled exceptionally well by the CLR and can be used to dramatically improve performance without sacrificing usability on the SQL end (much). Instead of relying entirely on either set-based or iterative operations, you can take a hybrid approach, use a set-based solution for the big hauls and switch to an iterative model for the tight loops.

The built-in hierarchyid and geospatial (i.e. geography) types in SQL Server 2008 are good examples of the denormalization problem. Both contain an (almost) arbitrarily large amount of data that are difficult to normalize without hurting performance - you would need to use recursion or cursors to do any meaningful work with them otherwise, or use a rat's nest of triggers and/or scheduled tasks to maintain a denormalization table.

Another problem I've solved with CLR types is inline compression. This might sound like a pointless or academic exercise, but when your fully-normalized data is pushing into the terabytes, an 80-90% reduction in size means a lot. SQL has its own built-in compression now and SQL 2005 had vardecimal, and those are good tools as well, but a domain-aware "minimization" algorithm can be several times more efficient in terms of both CPU load and compression rate. Obviously this doesn't apply to every problem, but it applies to some.

Yet another very common problem often found on this site is generating a sequence on the fly - for example a sequence of consecutive dates. Common solutions are recursive CTEs, static sequence tables, and the little-known spt_values tables, but a simple CLR UDF performs better than any of them and offers a lot more flexibility.

Last on my list: User-defined streaming aggregates are also very useful, especially for anything statistics-related. There are some things you simply cannot compose out of the built-in SQL aggregates, such as medians, weighted moving averages, etc. UDAs can also take multiple arguments so you can parameterize them; technically an aggregate isn't guaranteed to receive data in any particular order in the current version of SQL Server, but you can get around that limitation by feeding it a ROW_NUMBER as an additional argument and use this to implement just about any windowing function (have the aggregate spit out a UDT which can then be transformed to a table).

It's actually very frustrating how few examples there are of truly useful SQL-CLR applications; search on Google and you'll get 10 million results, every single one of them for some silly string-concatenation or regex. These are useful, but take a few minutes to learn about SQL UDTs and UDAs in particular and you'll start seeing plenty of uses for them in your own applications. Don't go nuts, of course - think carefully about whether or not there's a better solution in pure SQL - but don't discount them either.

Aaronaught 2010-01-26 19:50:19

This is one of the most informative posts I have ever read. Thank you.

Chris Shouts 2010-01-27 19:56:58

+1 very nicely put

Remus Rusanu 2010-01-28 08:52:44

+2 A:

Here is an example of something I used CLR procs for that I thought was neat:

Timed data updates from external webservices using CLR stored procs and SQL jobs.

We have an application that syncs some of the data it tracks with outside industry data feeds. The sync runs weekly for everything and on-demand for single updates too so I had an existing webservice API to access it. Things are already scheduled by a windows service but I thought why not be able schedule them like our other SQL jobs??

I created a CLR stored procedure that refrences the application's webservice API. Then I added a few parms for @RecordID to support single sync, and scheduled it up in Enterprise manager SQL jobs.

Now I can use the Job to run dB syncs or use the proc within other SQL procs or Triggers to update data from the external feed.

It may be cleaner to take the application webservice API out in the future and just use the external webservice directly. For now though, this was very fast to implement and a cool way to extend functionality to the SQL group.

Tj Kellie 2010-01-27 19:44:02

ansaurus

tags:

views:

answers:

What are good problems to solve using CLR stored procs?

related questions