ansaurus

Question

Answer 1

A:

Could you evaluate the question Jeff? And maybe show an example of your problem versus what you expect / want.

Using IN is in general a bad solution or a bad design IMHO, I know that in SQL the IN statement is very buggy and slow.

---- Edit -----

The Above post gave me a clearer view of what you want to do. I'd still like to see more code or / and know more about the specific problem. Do you want to dynamically change the amount of parameters? Or do you just want to create a type-strong select?

Filip Ekberg 2008-12-03 16:21:23

I've not found any particular performance problems with IN, or any bugs for that matter.Care to elaborate?

Andrew Rollings 2008-12-03 16:22:43

NOT IN (and other negation operators) can lead to performance problems, but doesn't necessarily cause them. Perhaps that is what he is talking about.

StingyJack 2008-12-03 16:42:51

Filip Ekberg 2008-12-03 17:50:05

Using MySQL as any sort of 'base case' for performance, optimization, or featureset is not a good idea :D

Matt Rogish 2008-12-03 18:38:40

I know, that's why everything now is re-written to MS SQL ;)

Filip Ekberg 2008-12-04 13:35:18

@Filip, Generally speaking you will be looking for a small set of rows in a larger table. If you use a negation value (NOT), there is more of a 'list' for the server to keep for matching rows. I can see where this can happen in an IN list if the condition is too broadly reaching.

StingyJack 2008-12-08 12:52:27

Answer 2

+31 A:

You can pass the parameter as a string

So you have the string

DECLARE @tags

SET @tags = ‘ruby|rails|scruffy|rubyonrails’

select * from Tags 
where Name in (SELECT item from fnSplit(@tags, ‘|’))
order by Count desc

Then all you have to do is pass the string as 1 parameter.

Here is the split function I use.

CREATE FUNCTION [dbo].[fnSplit](
    @sInputList VARCHAR(8000) -- List of delimited items
  , @sDelimiter VARCHAR(8000) = ',' -- delimiter that separates items
) RETURNS @List TABLE (item VARCHAR(8000))

BEGIN
DECLARE @sItem VARCHAR(8000)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
 BEGIN
 SELECT
  @sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
  @sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))

 IF LEN(@sItem) > 0
  INSERT INTO @List SELECT @sItem
 END

IF LEN(@sInputList) > 0
 INSERT INTO @List SELECT @sInputList -- Put the last item in
RETURN
END

David Basarab 2008-12-03 16:27:11

You can also join to the table-function with this approach.

Michael Haren 2008-12-04 03:06:26

I use a solution similar to this in Oracle. It doesn't have to be re-parsed as some of the other solutions do.

Leigh Riffel 2008-12-18 18:12:00

This is a pure database approach the other require work in the code outside of the database.

David Basarab 2008-12-18 18:31:59

Does this to a table scan or can it take advantage of indexs, etc?

Pure.Krome 2009-01-31 01:36:26

better would be to use CROSS APPLY against the SQL table function (at least in 2005 onwards), which essentially joins against the table that is returned

adolf garlic 2009-04-01 09:55:42

This worked like a charm for me. The pure database approach is what I was looking for.

EnocNRoll 2009-09-09 20:51:48

@adolf garlic there's no need for cross apply because there's no outer reference. Just join to the fnsplit function. `select T.* from Tags T INNER JOIN fnSplit(@tags, '|') X ON T.Name = X.item`

Emtucifor 2010-09-08 22:51:29

But it [fnSplit] returns a table...I wasn't aware you could join directly to a table function without using APPLY

adolf garlic 2010-09-09 12:05:33

Answer 3

+3 A:

I would pass a table type parameter (since its 2008), and do a where exists, or inner join. You may also use XML, using sp_xml_preparedocument, and then even index that temp table.

eulerfx 2008-12-03 16:30:13

do you have examples of this?

Jeff Atwood 2008-12-03 16:33:20

Answer 4

+1 A:

For a variable number of arguments like this the only way I'm aware of is to either generate the SQL explicitly or do something that involves populating a temporary table with the items you want and joining against the temp table.

ConcernedOfTunbridgeWells 2008-12-03 16:31:13

Answer 5

+4 A:

This is gross, but if you are guaranteed to have at least one, you could do:

SELECT ...
       ...
 WHERE tag IN( @tag1, ISNULL( @tag2, @tag1 ), ISNULL( @tag3, @tag1 ), etc. )

Having IN( 'tag1', 'tag2', 'tag1', 'tag1', 'tag1' ) will be easily optimized away by SQL Server. Plus, you get direct index seeks

Matt Rogish 2008-12-03 16:31:50

please fix @id1 with @tag1

Christian 2010-09-22 19:54:19

Answer 6

+141 A:

You can parameterize each value, so something like:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
string cmdText = "SELECT * FROM Tags WHERE Name IN ({0})";

string[] paramNames = tags.Select(
    (s, i) => "@tag" + i.ToString()
).ToArray();

string inClause = string.Join(",", paramNames);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause))) {
    for(int i = 0; i < paramNames.Length; i++) {
       cmd.Parameters.AddWithValue(paramNames[i], tags[i]);
    }
}

Which will give you:

cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0,@tag1,@tag2,@tag3)"
cmd.Parameters["@tag0"] = "ruby"
cmd.Parameters["@tag1"] = "rails"
cmd.Parameters["@tag2"] = "scruffy"
cmd.Parameters["@tag3"] = "rubyonrails"

Edit: Pre-emptive Sql Injection defense

No, this is not open to Sql Injection. The only injected text into CommandText is not based on user input. It's solely based on the hardcoded "@tag" prefix, and the index of an array. The index will always be an integer, is not user generated, and is safe.

The user inputted values are still stuffed into parameters, so there is no vulnerability there.

Edit:

Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly loose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).

Not that cached query plans aren't valuable, but IMO this query isn't nearly complicated enough to see much benefit from it. While the compilation costs may approach (or even exceed) the execution costs, you're still talking milliseconds.

If you have enough RAM, I'd expect MSSQL would probably cache a plan for the common counts of parameters as well. I suppose you could always add 5 parameters, and let the unspecified tags be NULL - the query plan should be the same, but it seems pretty ugly to me andI'm not sure that it'd worth the micro-optimization (although, on SO - it may very well be worth it).

Also, MSSQL 7+ will auto-parameterize queries, so using parameters isn't really necessary from a performance standpoint - it is, however, critical from a security standpoint - esp. with user inputted data like this.

Mark Brackett 2008-12-03 16:35:54

I've used this before and it works well.

StingyJack 2008-12-03 16:46:25

Basically the same as my answer to the "related" question and obviously the best solution since it is constructive and efficient rather than interpretive (much harder).

tvanfosson 2008-12-03 16:53:02

Doesn't this limit the number of items in your IN clause?

Jon Limjap 2008-12-18 08:25:30

@Jon, no, it doesn't. You can add as many items as you like to the tags array.

nickd 2008-12-18 10:29:04

This is how LINQ to SQL does it, BTW

Mark Cidade 2008-12-18 18:55:35

Isn't there a max number of Parameters? so if the user doesn't know how many tags, it might go over the max_number (around 200 or 255 params?). Secondly, why is using params better than just a dynamic sql with the values constructed on the fly (replace @Tag1 with the value, in the above example)?

Pure.Krome 2009-01-02 02:15:45

@Pure: The whole point of this is to avoid SQL Injection, which you would be vulnerable to if you used dynamic SQL.

Ray 2009-02-04 23:27:47

Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly loose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).

Mark 2009-08-19 19:01:55

Not a fan of this since it limits you to a fixed number of values. David Basarb's is more flexible and only limits you based on the length of the parameter data type.

Registered User 2010-02-11 01:58:43

@God of Data - Yes, I suppose if you need more than 2100 tags you'll need a different solution. But Basarb's could only reach 2100 if the average tag length was < 3 chars (since you need a delimiter as well). http://msdn.microsoft.com/en-us/library/ms143432.aspx

Mark Brackett 2010-02-11 12:17:14

Answer 7

+1 A:

I think the solution of Mark Brackett is the way to go but keep in mind that there's a limit for command parameters. I think it's about 4000.

Petar Petrov 2008-12-03 16:40:52

This doesn't strike me as an answer to the question.

postfuturist 2008-12-19 23:39:57

Answer 8

+38 A:

select * from Tags
where '|ruby|rails|scruffy|rubyonrails|'
like '%|' + Name + '|%'

EDIT: One caveat to Joel's solution. This is clever, but it forces an Index Scan instead of an Index Seek (because LIKE %x% cannot be indexed, whereas LIKE x% can), so it will be at least 10x slower. This may or may not matter depending on the size of your table.

EDIT: C# code to parameterize:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

Joel Spolsky 2008-12-03 16:41:17

That will be hella slow

Matt Rogish 2008-12-03 16:43:31

(although I must admit it is clever)

Matt Rogish 2008-12-03 16:44:05

yeah, it is 10x slower, but it's very easily parameterized, heh. Not sure how much faster it would be to call fnSplit() as proposed by Longhorn213's answer

Jeff Atwood 2008-12-03 16:48:04

Yes, this is a table scan. Great for 10 rows, lousy for 100,000.

Will Hartung 2008-12-03 16:48:30

@Matt, I agree. The method from Mark Brackett will likely scale better.

StingyJack 2008-12-03 16:50:38

I guess I should have closed this as "Not a Real Question" since you've accepted "Not a Real Answer"

tvanfosson 2008-12-03 16:50:39

I agree...this is a good solution for a small table. Doesn't require any temp tables or a bunch of parameters.

Mike Shepard 2008-12-03 16:51:36

Longhorn213's fnSplit function would be called once, taking a little time, but is then able to take advantage of an index on Tags.Name. Joel's solution probably requires a full scan of Tags, which may be slow for a big table. Having said that, I do use Joel's method myself for small tables.

Tony Andrews 2008-12-03 16:51:47

Make sure you test on tags that have pipes in them.

Joel Coehoorn 2008-12-03 17:16:29

This doesn't even answer the question. Granted, it's easy to see where to add the parameters, but how can you accept this a solution if it doesn't even bother to parameterize the query? It only looks simpler than @Mark Brackett's because it isn't parameterized.

tvanfosson 2008-12-03 20:14:44

tvanfosson: Good point. You're not using parameters, but actually still just strings...

Matt Rogish 2008-12-03 20:37:22

"Granted, it's easy to see where to add the parameters" it's like the np-complete thing.. we've reduced the query to a typical form which is trivial to parameterize. The problem with IN is the inherent variability, how many INs can we have? 50? 1000? 10000?

Jeff Atwood 2008-12-03 21:10:22

Apparently in MS-SQL the number is so large that they don't say what it is. If you're getting upwards of 10K, then the table join solution is probably better. This particular query is just going to keep getting worse and worse as the number increases. Imagine scanning a 50K char string each time.

tvanfosson 2008-12-05 17:40:06

In this case, we're obviously talking about tags, and the SO system limits you to 5 total, so it probably won't be that bad.

Joel Coehoorn 2008-12-11 14:26:30

@Joel - there's actually 2 inefficiencies in this solution. The parsing of the char string (the '|' + @tags + '|'), and the size of the table - since this needs a table scan. The former shouldn't be an issue with SO's tag system, but the latter certainly could be (there's about 16500 tags now)

Mark Brackett 2008-12-18 14:05:17

I've used this method with success in the past. I've also tested it. On a "typical" table of 500k rows, this method takes about four seonds. You can optimized by pre-creating the piped parameter and storing that as a field. Doing so reduces the query time by about half.

Robert C. Barth 2008-12-18 17:29:48

@Joel: Clever, and it works. So what if it's going to do an index scan, performance only has to be "good enough". Not knowing the constraints on the Name column, I'm going to consider the edge cases (null, empty string, contains pipe character), as well as the obscure corner case, a Name value containing a wildcard e.g. 'pe%ter' is going to match '|peanut|butter|' but not '|butter|peanut|'. (Yes, it's an obscure case, one that isn't going to be tested in QA, but will get exercised in production.) It's a fairly easy workaround (in some DBMS) to escape the wildcards.

spencer7593 2009-05-29 21:27:06

AlexKuznetsov 2009-08-19 22:21:57

Answer 9

+73 A:

For SQL Server 2008, you can use a table valued parameter. It's a bit of work, but it is arguably cleaner than my other method.

First, you have to create a type

CREATE TYPE dbo.TagNamesTableType AS TABLE ( Name nvarchar(50) )

Then, your ADO.NET code looks like this:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
cmd.CommandText = "SELECT Tags.* FROM Tags JOIN @tagNames as P ON Tags.Name = P.Name";

// value must be IEnumerable
cmd.Parameters.AddWithValue("@tagNames", tags).SqlDbType = SqlDbType.Structured;
cmd.Parameters["@tagNames"].TypeName = "dbo.TagNamesTableType";

Mark Brackett 2008-12-03 16:53:19

I dig. +1 for SQL Server 2008! (maybe one of these days I'll upgrade from 2k5)

Matt Rogish 2008-12-03 16:56:54

+1, Haveanybody tried this with Linq-2-Sql?

TT 2008-12-03 17:29:28

You can't [easily] use TVPs with Linq To Sql, so you need to fall back onto the good old SqlCommand object. I'm having to do exactly this right now because to get around Linq-To-Sql's lousey round-trip update/insert habit.

Mark 2009-08-19 19:03:33

Answer 10

+1 A:

I think this post answers essentially the same question.

JasonS 2008-12-03 16:54:26

definitely related but I specifically said I wanted to avoid procs in my Q

Jeff Atwood 2008-12-03 17:12:58

whoops, missed that last line in your Q.

JasonS 2008-12-03 19:14:26

Answer 11

+4 A:

This is possibly a half nasty way of doing it, I used it once, was rather effective.

Depending on your goals it might be of use.

Create a temp table with one column.
INSERT each lookup value into that column.
Instead of using an IN you can then just use your standard JOIN rules. ( Flexibilty++ )

This has a bit of added flexibility in what you can do, but its more suited for situations where you have a large table to query, with good indexing, and you want to use the parameterised list more than once. Saves having to execute it twice and have all the sanitation done manually.

I never got around to profiling exactly how fast it was, but in my situation it was needed.

Kent Fredric 2008-12-03 17:04:00

Answer 12

+6 A:

We have function that creates a table variable that you can join to:

ALTER     FUNCTION [dbo].[fn_sqllist_to_table](@list as varchar(8000), @delim as varchar(10))
RETURNS @listTable table(
  Position int,
  Value varchar(8000)
  )
AS
BEGIN
    declare @myPos int
  set @myPos = 1

  while charindex(@delim, @list) > 0
  begin
    insert into @listTable(Position,Value)
    values(@myPos, left(@list, charindex(@delim, @list) - 1))

    set @myPos = @myPos + 1
    if charindex(@delim, @list) = len(@list)
      insert into @listTable(Position, Value)
      values(@myPos, '')
    set @list = right(@list, len(@list) - charindex(@delim, @list))
  end

  if len(@list) > 0
    insert into @listTable(Position, Value)
    values(@myPos, @list)

Return

So:

@Name varchar(8000) = null // parameter for search values    

select * from Tags 
where Name in (SELECT value From fn_sqllist_to_table(@Name,',')))
order by Count desc

David Robbins 2008-12-03 17:11:52

Answer 13

+2 A:

In ColdFusion we just do:

<cfset myvalues = "ruby|rails|scruffy|rubyonrails">
    <cfquery name="q">
        select * from sometable where values in <cfqueryparam value="#myvalues#" list="true">
    </cfquery>

rip747 2008-12-10 21:54:18

Answer 14

+25 A:

This is a late answer, but I heard Jeff/Joel talk about this on the podcast today, so I checked out the question. I see that there's no LINQ-to-SQL answer given, so I'll supply one. I thought I recalled SO was using LINQ-to-SQL, but maybe it was ditched -- who knows. Anyway, here's the same thing in LINQ-to-SQL.

var inValues = new [] { "ruby","rails","scruffy","rubyonrails" };

var results = from tag in Tags
              where inValues.Contains(tag.Name)
              select tag;

That's it. And, yes, LINQ already looks backwards enough, but the Contains clause seems extra backwards to me. When I had to do a similar query for a project at work, I naturally tried to do this the wrong way by doing a join between the local array and the SQL Server table, figuring the LINQ-to-SQL translater would be smart enough to handle the translation somehow. It didn't, but it did provide an error message that was descriptive and pointed me towards using Contains.

Anyway, if you run this in the highly recommended LINQPad, and run this query, you can view the actual SQL that the SQL LINQ provider generated. It'll show you each of the values getting parameterized into an IN clause.

Peter Meyer 2008-12-19 05:40:15

you have to love the cleanness of LINQ

cgreeno 2008-12-20 19:47:49

No doubt! I know it boils down to Mark Brackett's answer that's the top vote getter on this page, but in this situation, it's so nice and tidy and still type safe, etc.

Peter Meyer 2008-12-21 02:47:17

Answer 15

+4 A:

The proper way IMHO is to store the list in a character string (limited in length by what the DBMS support); the only trick is that (in order to simplify processing) I have a separator (a comma in my example) at the beginning and at the end of the string. The idea is to "normalize on the fly", turning the list into a one-column table that contains one row per value. This allows you to turn

in (ct1,ct2, ct3 ... ctn)

into an

in (select ...)

or (the solution I'd probably prefer) a regular join, if you just add a "distinct" to avoid problems with duplicate values in the list.

Unfortunately, the techniques to slice a string are fairly product-specific. Here is the SQL Server version:

 with qry(n, names) as
       (select len(list.names) - len(replace(list.names, ',', '')) - 1 as n,
               substring(list.names, 2, len(list.names)) as names
        from (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' names) as list
        union all
        select (n - 1) as n,
               substring(names, 1 + charindex(',', names), len(names)) as names
        from qry
        where n > 1)
 select n, substring(names, 1, charindex(',', names) - 1) dwarf
 from qry;

The Oracle version:

 select n, substr(name, 1, instr(name, ',') - 1) dwarf
 from (select n,
             substr(val, 1 + instr(val, ',', 1, n)) name
      from (select rownum as n,
                   list.val
            from  (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' val
                   from dual) list
            connect by level < length(list.val) -
                               length(replace(list.val, ',', ''))));

and the MySQL version:

select pivot.n,
      substring_index(substring_index(list.val, ',', 1 + pivot.n), ',', -1) from (select 1 as n
     union all
     select 2 as n
     union all
     select 3 as n
     union all
     select 4 as n
     union all
     select 5 as n
     union all
     select 6 as n
     union all
     select 7 as n
     union all
     select 8 as n
     union all
     select 9 as n
     union all
     select 10 as n) pivot,    (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' val) as list where pivot.n <  length(list.val) -
                   length(replace(list.val, ',', ''));

(Of course, "pivot" must return as many rows as the maximum number of items we can find in the list)

2009-02-04 18:51:49

Answer 16

+26 A:

The original question was "How do I parameterize a query ..."

Let me state right here, that this is not an answer to the original question. There are already some demonstrations of that in other good answers.

With that said, go ahead and flag this answer, downvote it, mark it as not an answer... do whatever you believe is right.

Selected answer

What I want to address here is the approach given in Joel Spolsky's answer, the answer "selected" as the right answer.

Joel Spolsky's approach is clever. And it works reasonably, it's going to exhibit predictable behavior and predictable performance, given "normal" values, and with the normative edge cases, such as NULL and the empty string. And it may be sufficient for a particular application.

But in terms generalizing this approach, let's also consider the more obscure corner cases, like when the Name column contains a wildcard character (as recognized by the LIKE predicate.) The wildcard character I see most commonly used is % (a percent sign.). So let's deal with that here now, and later go on to other cases.

Some problems with % character

Consider a Name value of 'pe%ter'. (For the examples here, I use a literal string value in place of the column name.) A row with a Name value of `'pe%ter' would be returned by a query of the form:

select ...
 where '|peanut|butter|' like '%|' + 'pe%ter' + '|%'

But that same row will not be returned if the order of the search terms is reversed:

select ...
 where '|butter|peanut|' like '%|' + 'pe%ter' + '|%'

The behavior we observe is kind of odd. Changing the order of the search terms in the list changes the result set.

It almost goes without saying that we might not want pe%ter to match peanut butter, no matter how much he likes it.

Obscure corner case

(Yes, I will agree that this is an obscure case. Probably one that is not likely to be tested. We wouldn't expect a wildcard in a column value. We may assume that the application prevents such a value from being stored. But in my experience, I've rarely seen a database constraint that specifically disallowed characters or patterns that would be considered wildcards on the right side of a LIKE comparison operator.

Patching a hole

One approach to patching this hole is to escape the % wildcard character. (For anyone not familiar with the escape clause on the operator, here's a link to the SQL Server documentation.

select ...
 where '|peanut|butter|'
  like '%|' + 'pe\%ter' + '|%' escape '\'

Now we can match the literal %. Of course, when we have a column name, we're going to need to dynamically escape the wildcard. We can use the REPLACE function to find occurrences of the %character and insert a backslash character in front of each one, like this:

select ...
 where '|pe%ter|'
  like '%|' + REPLACE( 'pe%ter' ,'%','\%') + '|%' escape '\'

So that solves the problem with the % wildcard. Almost.

Escape the escape

We recognize that our solution has introduced another problem. The escape character. We see that we're also going to need to escape any occurrences of escape character itself. This time, we use the ! as the escape character:

select ...
 where '|pe%t!r|'
  like '%|' + REPLACE(REPLACE( 'pe%t!r' ,'!','!!'),'%','!%') + '|%' escape '!'

The underscore too

Now that we're on a roll, we can add another REPLACE handle the underscore wildcard. And just for fun, this time, we'll use $ as the escape character.

select ...
 where '|p_%t!r|'
  like '%|' + REPLACE(REPLACE(REPLACE( 'p_%t!r' ,'$','$$'),'%','$%'),'_','$_') + '|%' escape '$'

I prefer this approach to escaping because it works in Oracle and MySQL as well as SQL Server. (I usually use the \ backslash as the escape character, since that's the character we use in regular expressions. But why be constrained by convention!

Those pesky brackets

SQL Server also allows for wildcard characters to be treated as literals by enclosing them in brackets []. So we're not done fixing yet, at least for SQL Server. Since pairs of brackets have special meaning, we'll need to escape those as well. If we manage to properly escape the brackets, then at least we won't have to bother with the hyphen - and the carat ^ within the brackets. And we can leave any %and _ characters inside the brackets escaped, since we'll have basically disabled the special meaning of the brackets.

Finding matching pairs of brackets shouldn't be that hard. It's a little more difficult than handling the occurrences of singleton % and _. (Note that it's not sufficient to just escape all occurrences of brackets, because a singleton bracket is considered to be a literal, and doesn't need to be escaped. The logic is getting a little fuzzier than I can handle without running more test cases.)

Inline expression gets messy

That inline expression in the SQL is getting longer and uglier. We can probably make it work, but heaven help the poor soul that comes behind and has to decipher it. As much of a fan I am for inline expressions, I'm inclined not use one here, mainly because I don't want to have to leave a comment explaining the reason for the mess, and apologizing for it.

A function where ?

Okay, so, if we don't handle that as an inline expression in the SQL, the closest alternative we have is a user defined function. And we know that won't speed things up any (unless we can define an index on it, like we could with Oracle.) If we've got to create a function, we might better do that in the code that calls the SQL statement.

And that function may have some differences in behavior, dependent on the DBMS and version. (A shout out to all you Java developers so keen on being able to use any database engine interchangeably.)

Domain knowledge

We may have specialized knowledge of the domain for the column, (that is, the set of allowable values enforced for the column. We may know a priori that the values stored in the column will never contain a percent sign, an underscore, or bracket pairs. In that case, we just include a quick comment that those cases are covered.

The values stored in the column may allow for % or _ characters, but a constraint may require those values to be escaped, perhaps using a defined character, such that the values are LIKE comparison "safe". Again, a quick comment about the allowed set of values, and in particular which character is used as an escape character, and go with Joel Spolsky's approach.

But, absent the specialized knowledge and a guarantee, it's important for us to at least consider handling those obscure corner cases, and consider whether the behavior is reasonable and "per the specification".

Other issues recapitulated

I believe others have already sufficiently pointed out some of the other commonly considered areas of concern:

SQL injection (taking what would appear to be user supplied information, and including that in the SQL text rather than supplying them through bind variables. Using bind variables isn't required, it's just one convenient approach to thwart with SQL injection. There are other ways to deal with it:
optimizer plan using index scan rather than index seeks, possible need for an expression or function for escaping wildcards (possible index on expression or function)
using literal values in place of bind variables impacts scalability

Conclusion

I like Joel Spolsky's approach. It's clever. And it works.

But as soon as I saw it, I immediately saw a potential problem with it, and it's not my nature to let it slide. I don't mean to be critical of the efforts of others. I know many developers take their work very personally, because they invest so much into it and they care so much about it. So please understand, this is not a personal attack. What I'm identifying here is the type of problem that crops up in production rather than testing.

Yes, I've gone far afield from the original question. But where else to leave this note concerning what I consider to be an important issue with the "selected" answer for a question?

My hope is that someone will find this post to be of some use.

Apology

Again, I do apologize for my failure to abide by the rules and conventions of Stack Overflow, posting here what is clearly not an answer to the OP's question.

spencer7593 2009-05-29 23:18:15

+1 for being complete and helping to educate the programming population.

DrFloyd5 2009-05-29 23:38:16

+1 for taking the time to make such a wonderfully formatted post.

Aren 2010-07-21 00:27:09

Answer 17

+2 A:

Kind of late in the game, but this still might help someone. Another possible solution is instead of passing a variable number of arguments to a stored procedure, pass a single string containing the names you're after, but make them unique by surrounding them with '<>'. Then use PATINDEX to find the names.

SELECT * FROM Tags WHERE PATINDEX('%<' + Name + '>%',',,,') > 0

ArtOfCoding 2010-02-12 19:22:03

Answer 18

+2 A:

I think this is a case when a static query is just not the way to go. Dynamically build the list for your in clause, escape your single quotes, and dynamically build SQL. In this case you probably won't see much of a difference with any method due to the small list, but the most efficient method really is to send the SQL exactly as it is written in your post. I think it is a good habit to write it the most efficient way, rather than to do what makes the prettiest code, or consider it bad practice to dynamically build SQL.

I have seen the split functions take longer to execute than the query themselves in many cases where the parameters get large. A stored procedure with table valued parameters in SQL 2008 is the only other option I would consider, although this will probably be slower in your case. TVP will probably only be faster for large lists if you are searching on the primary key of the TVP, because SQL will build a temporary table for the list anyway (if the list is large). You won't know for sure unless you test it.

I have also seen stored procedures that had 500 parameters with default values of null, and having WHERE Column1 IN (@Param1, @Param2, @Param3, ..., @Param500). This caused SQL to build a temp table, do a sort/distinct, and then do a table scan instead of an index seek. That is essentially what you would be doing by parameterizing that query, although on a small enough scale that it won't make a noticeable difference. I highly recommend against having NULL in your IN lists, as if that gets changed to a NOT IN it will not act as intended. You could dynamically build the parameter list, but the only obvious thing that you would gain is that the objects would escape the single quotes for you. That approach is also slightly slower on the application end since the objects have to parse the query to find the parameters. It may or may not be faster on SQL, as parameterized queries call sp_prepare, sp_execute for as many times you execute the query, followed by sp_unprepare.

The reuse of execution plans for stored procedures or parameterized queries may give you a performance gain, but it will lock you in to one execution plan determined by the first query that is executed. That may be less than ideal for subsequent queries in many cases. In your case, reuse of execution plans will probably be a plus, but it might not make any difference at all as the example is a really simple query.

Cliffs notes:

For your case anything you do, be it parameterization with a fixed number of items in the list (null if not used), dynamically building the query with or without parameters, or using stored procedures with table valued parameters will not make much of a difference. However, my general recommendations are as follows:

Your case/simple queries with few parameters:

Dynamic SQL, maybe with parameters if testing shows better performance.

Queries with reusable execution plans, called multiple times by simply changing the parameters or if the query is complicated:

SQL with dynamic parameters.

Queries with large lists:

Stored procedure with table valued parameters. If the list can vary by a large amount use WITH RECOMPILE on the stored procedure, or simply use dynamic SQL without parameters to generate a new execution plan for each query.

Scott 2010-06-09 20:28:50

Answer 19

+1 A:

In my opinion, the best source to solve this problem, is what has been posted on this site:

Syscomments. Dinakar Nethi

CREATE FUNCTION dbo.fnParseArray (@Array VARCHAR(1000),@separator CHAR(1))
RETURNS @T Table (col1 varchar(50))
AS 
BEGIN
 --DECLARE @T Table (col1 varchar(50))  
 -- @Array is the array we wish to parse
 -- @Separator is the separator charactor such as a comma
 DECLARE @separator_position INT -- This is used to locate each separator character
 DECLARE @array_value VARCHAR(1000) -- this holds each array value as it is returned
 -- For my loop to work I need an extra separator at the end. I always look to the
 -- left of the separator character for each array value

 SET @array = @array + @separator

 -- Loop through the string searching for separtor characters
 WHILE PATINDEX('%' + @separator + '%', @array) <> 0 
 BEGIN
    -- patindex matches the a pattern against a string
    SELECT @separator_position = PATINDEX('%' + @separator + '%',@array)
    SELECT @array_value = LEFT(@array, @separator_position - 1)
    -- This is where you process the values passed.
    INSERT into @T VALUES (@array_value)    
    -- Replace this select statement with your processing
    -- @array_value holds the value of this element of the array
    -- This replaces what we just processed with and empty string
    SELECT @array = STUFF(@array, 1, @separator_position, '')
 END
 RETURN 
END

Use:

SELECT * FROM dbo.fnParseArray('a,b,c,d,e,f', ',')

CREDITS FOR: Dinakar Nethi

Ph.E 2010-07-22 14:47:17

ansaurus

tags:

views:

answers:

Parameterizing a SQL IN clause?

related questions