views:

111

answers:

4

Hi

I have seen several patterns used to 'overcome' the lack of constants in SQL Server, but none of them seem to satisfy both performance and readability / maintainability concerns.

In the below example, assuming that we have an integral 'status' classification on our table, the options seem to be:

1) Just hard code it, and possibly just 'comment' the status

-- StatusId 87 = Loaded
SELECT ... FROM [Table] WHERE StatusId = 87

2) Using a lookup table for states, and then joining to this table so that the WHERE clause references the friendly name.

SELECT ... FROM [Table] WHERE StatusId = (SELECT StatusId from TableStatus WHERE 
StatusName = 'Loaded')

OR

SELECT ... FROM [Table] t INNER JOIN TableStatus ts On t.StatusId = ts.StatusId WHERE ts.StatusName = 'Loaded'

3) And most recently have seen a 'fleet' of scalar UDF's defined which return constants, viz

CREATE Function LoadedStatus()
RETURNS INT
AS
 BEGIN
  RETURN 87
 END

and then

SELECT ... FROM [Table] WHERE StatusId = LoadedStatus()

How have other SO users have solved this common issue?

Edit : Bounty - Does anyone have a best practice method for maintaining $(variables) in DBProj DDL / Schema scripts as per Remus answer and comment?

+2  A: 

I have been using the scalar function option in our DB and it's work fine and as per my view is the best way of this solution.

if more values related to one item then made lookup like if you load combobox or any other control with static value then use lookup that's the best way to do this.

KuldipMCA
+5  A: 

Hard coded. With SQL performance trumps maintainability.

The consequences in the execution plan between using a constant that the optimizer can inspect at plan generation time vs. using any form of indirection (UDF, JOIN, sub-query) are often dramatic. SQL 'compilation' is an extraordinary process (in the sense that is not 'ordinary' like say IL code generation) in as the result is determined not only by the language construct being compiled (ie. the actual text of the query) but also by the data schema (existing indexes) and actual data in those indexes (statistics). When a hard coded value is used, the optimizer can give a better plan because it can actually check the value against the index statistics and get an estimate of the result.

Another consideration is that a SQL application is not code only, but by a large margin is code and data. 'Refactoring' a SQL program is ... different. Where in a C# program one can change a constant or enum, recompile and happily run the application, in SQL one cannot do so because the value is likely present in millions of records in the database and changing the constant value implies also changing GBs of data, often online while new operations occur.

Just because the value is hard-coded in the queries and procedures seen by the server does not necessarily mean the value has to be hard coded in the original project source code. There are various code generation tools that can take care of this. Consider something as trivial as leveraging the sqlcmd scripting variables:

defines.sql:

:setvar STATUS_LOADED 87

somesource.sql:

:r defines.sql
SELECT ... FROM [Table] WHERE StatusId = $(STATUS_LOADED);

someothersource.sql:

:r defines.sql
UPDATE [Table] SET StatusId = $(STATUS_LOADED) WHERE ...;
Remus Rusanu
Thanks - we are using DBPRO and this led me to http://blogs.msdn.com/b/gertd/archive/2007/01/08/variables-to-the-rescue.aspx
nonnb
Unfortunately DBPRO does not accept the $(variable) substitution in *every* source .sql file, only in the script .sql files (the script run after the schema is deployed). It would be great if you could use variables in procedures definition...
Remus Rusanu
Thanks Remus - if we can workaround this limitation it would be a perfect solution IMHO. I've added a bounty ;)
nonnb
+1, `Hard coded. With SQL performance trumps maintainability.`
KM
The only workaround I know of for DBPRO variables in non pre/post deploy scripts is to use explicit build steps to transform template files into source files, and only edit/modify the templates. See http://blogs.msdn.com/b/psirr/archive/2009/07/31/template-driven-sql-generation.aspx , specifically the template VS add-in they link there. Personally, I think this approach introduces more trouble than it solves, but I have to mention it non the less.
Remus Rusanu
Thanks Remus - the bounty is yours - holding thumbs that a future release of DataDude will address this!
nonnb
+1  A: 

You can also add more fields to your status table that act as unique markers or groupers for status values. For example, if you add an isLoaded field to your status table, record 87 could be the only one with the field's value set, and you can test for the value of the isLoaded field instead of the hard-coded 87 or status description.

Beth
Hi Beth, thanks for the response - interesting, definitely a valid readability pattern.So I would add in bit fields for each stateIsLoaded, IsCancelled, IsSomeOtherStatusand then each state would only have its applicable bit set.I guess the downside is still that the extra join to Status is needed in the pursuit of readability.
nonnb
it's really more of a maintenance thing. you don't want status text in your queries as criteria because the text can change, and sometimes the IDs end up changing, too. to avoid a join, you'd have to test the value in the main table, as in your first example
Beth
+2  A: 

While I agree with Remus Rusanu, IMO, maintainability of the code (and thus readability, least astonishment etc.) trump other concerns unless the performance difference is sufficiently significant as to warrant doing otherwise. Thus, the following query loses on readability:

Select ..
From Table
Where StatusId = 87

In general, when I have system dependent values which will be referenced in code (perhaps mimicked in an enumeration by name), I use string primary keys for the tables in which they are kept. Contrast this to user-changeable data in which I generally use surrogate keys. The use of a primary key that requires entry helps (albeit not perfectly) to indicate to other developers that this value is not meant to be arbitrary.

Thus, my "Status" table would look like:

Create Table Status
(
    Code varchar(6) Not Null Primary Key
    , ...
)
Select ...
From Table
Where StatusCode = 'Loaded'

This makes the query more readable, it does not require a join to the Status table, and does not require the use of a magic number (or guid). Using user-defined functions, IMO is a bad practice. Beyond the performance implications, no developer would ever expect UDFs to be used in this manner and thus violates the least astonishment criteria. You would almost be compelled to have a UDF for each constant value; otherwise, what you are passing into the function: a name? a magic value? If a name, you might as well keep the name in a table and use it directly in the query. If a magic value, you are back the original problem.

Thomas
Yup, makes sense, however the 'smallest int' PK is a DB design standard at our company (the standard surrogate vs natural keys argument has already been played out in favour of surrogates). Our DBA's like narrow indices it seems, and a column like status is heavily filtered in queries.
nonnb
@nonnb - Having a company standard of natural keys vs surrogate keys falsely assumes that one and only one solution makes sense for all tables. It is akin to mandating all doors have a single knob even double doors or doors that need no knob. Both types of keys have their place. Using surrogate keys for currencies for example would be a poor design decision IMO. It will add unnecessary joins and encourage hard to read code. In short, it will cost the company money. A single character status code would still be better than a magic number and take up the same space as a tinyint.
Thomas
@Thomas - you are preaching to the converted. I spent a year maintaining a Pick / Universe DB not so long ago, and the benefits of codifying "foreign keys" (not that U2 had any) and thus saving on the join altogether were crystal clear :)
nonnb