views:

68

answers:

2

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.

I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.

Dynamic Sql that concatinates a string and executes it.

EXEC('SELECT * FROM TableX WHERE xId IN (' + @Parameter + ')'

Using a user defined function to split a delimited string into a table

SELECT * FROM TableY INNER JOIN SPLIT(@Parameter) ON yID = splitId

USING XML as the Parameter instead of a delimited varchar value

SELECT * FROM TableZ JOIN @Parameter.Nodes(xpath) AS x (y) ON ...

While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?

+5  A: 

There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)

I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.

Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.

I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.

That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.

AdaTheDev
+1. Nice catch.
Quassnoi
Wouldn't this, often enough, just push the need to parse-n-load the data into tabular format upstream to the application? At which point, what's faster there -- XML or looping code parsing?
Philip Kelley
+1 Excellent explanation on how to use Table Parameters from the code side beyond DataReaders and DataTables.
JamesEggers
+1, and TVPs have the best performance of all of these approaches as long as the TVP has a primary key. For large `INSERT` operations I still use `BULK INSERT`, but for `JOIN` or `IN` queries I've thrown in anywhere up to 5000 items and not seen any really noticeable perf hit.
Aaronaught
+2  A: 

UDF invocation is a little bit more costly than splitting the XML using the built-in function.

However, this only needs to be done once per query, so the performance difference will be negligible.

Quassnoi
True for small sets. What about big ones, for various values of "big"?
Philip Kelley
@Philip: It's even truer for big sets. `TSQL` lacks pointers, so to parse the long strings it will call our good old friend Schlemiel the painter. `XML` parsing is done internally and optimized quite well.
Quassnoi