Regarding the use of computed columns mentioned in the answer, I wanted to confirm the claims that using a computed column would produce better performance (it didn't make sense to me, but I'm no SQL Server guru). The results I got indicated that using a computed column is indeed slower - much slower, than a simple group by or subquery. I ran a test on a SQL Server instance I have on my own PC - here is the methodology and results:
CREATE TABLE smb_header (keycol INTEGER NOT NULL
, name1 VARCHAR2(255)
, name2 VARCHAR2(255));
INSERT INTO smb_header
VALUES (1
, 'This is column 1'
, 'This is column 2'
);
INSERT INTO smb_header
SELECT (SELECT MAX(keycol)
FROM smb_header
) + keycol
, name1
, name2
FROM smb_header;
REM (repeat 20 times to generate ~1 million rows)
ALTER TABLE smb_header ADD PRIMARY KEY (keycol);
CREATE TABLE smb_detail (keycol INTEGER
, commentno INTEGER
, commenttext VARCHAR2(255));
INSERT INTO smb_detail
SELECT keycol
, 1
, 'A comment that describes this issue'
FROM smb_header;
ALTER TABLE smb_detail ADD PRIMARY KEY (keycol, commentno);
ALTER TABLE smb_detail ADD FOREIGN KEY (keycol)
REFERENCES smb_header (keycol);
INSERT INTO smb_detail
SELECT keycol
, (SELECT MAX(commentno)
FROM smb_detail sd2
WHERE sd2.keycol = sd1.keycol
) + commentno
, 'A comment that follows comment number '
+ CAST(sd1.commentno AS VARCHAR(32))
FROM smb_detail sd1
WHERE MOD(keycol, 31) = 0;
REM repeat 5 times, to create some records that have 64 comments
REM where others have one.
At this point, there will be around 1 million rows in the header, and either 1 or 64 comments for each.
Now I create the function (the same as yours above, only with my column & table names), and the computed column:
alter table dbo.smb_header add CommentCountPersist as dbo.CountComments(keycol)
By the way, PERSISTED will not work for this column, as I suspected in my comments above - it is not possible or too difficult for SQL Server to keep track of which rows need updating if you refer to other tables in your function. Using the PERSISTED keyword produces the error:
Msg 4934, Level 16, State 3, Line 1
Computed column 'CommentCountPersist' in table 'smb_header' cannot be
persisted because the column does user or system data access.
This makes sense to me - I don't see how SQL Server could determine what rows need updating when other rows change, for any function that could be implemented, without the process of updates being horribly inefficient.
Now, for the tests. I create a temp table #holder to insert the rows into - I want to make sure when my queries run, I process the entire result set, not just the first few rows that would appear in the Mgmt Studio grid control.
SELECT h.keycol
, h.name1
, CommentCount
INTO #holder
FROM smb_header h
WHERE h.keycol < 0
Here are the results of my queries. First, the computed column:
INSERT
INTO #holder
SELECT h.keycol
, h.name1
, CommentCount
FROM smb_header h
WHERE h.keycol between 5000 and 10000
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 1, logical reads 10160, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'smb_header'. Scan count 1, logical reads 44, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 458 ms.
(5001 row(s) affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Now the GROUP BY version, the computed column:
INSERT
INTO #holder
SELECT h.keycol
, h.name1
, COUNT(*)
FROM smb_header h
, smb_detail d
WHERE h.keycol between 5000 and 10000
AND h.keycol = d.keycol
GROUP BY h.keycol, h.name1
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'smb_header'. Scan count 1, logical reads 44, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'smb_detail'. Scan count 1, logical reads 366, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 13 ms.
(5001 row(s) affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Writing the query with the subquery in the SELECT clause as Remus did above yields the same plan & performance as the GROUP BY (which would be expected).
As you can see, the computed column performs version significantly worse. This makes sense to me, as the optimizer is forced to call the function and do the count(*) for every row in the header, instead of using more sophisticated methods of resolving the two sets of data.
It is possible that I'm doing something wrong here. I'd be interested in marc_s contributing his findings.