views:

122

answers:

3

I've never used TSQL before, but I decided I wanted to poke around in the SO data dump anyways. So this is probably a rookie question. I did try to search for an answer, but I don't know the technical term for what I'm trying to do, and search is only so good if you don't have the right keywords.

My goal is to find out how many questions have a score of 0, how many have a score of 1, how many have a score of 2, &c. I can write a query to find out those individual counts, no problem. I'd like to be able to run a query containing a loop, though. Here's what I'm using so far:

DECLARE @counter int
SET @counter = 0
WHILE @counter < 3
BEGIN
    SELECT
        COUNT(*)
    FROM
        Posts
    WHERE
        PostTypeId = 1
    AND
        Score = @counter

    SET @counter = @counter + 1
END

(EDIT: the eventual goal is to expand from [0, 3] to [-100, 1000] or whatever is necessary to hit all existing questions) Obviously, my problem is that I'm grabbing the count in each cycle of the loop and then not doing anything with it. Currently, the result I get is just the count of whatever @counter is set to initially; the number of iterations is irrelevant. Is it possible to store the results somewhere and then display as something like:

+-------+--------------------------------+
| Score | NumberOfQuestionsWithThatScore |
+-------+--------------------------------+
|   -10 |                           111  |
+-------+--------------------------------+
|    -9 |                             0  |
+-------+--------------------------------+
|    -8 |                           248  |
+-------+--------------------------------+
|   ... |                           ...  |
+-------+--------------------------------+

(EDIT: any reasonably clear display is fine, the above table is just a sample; EDIT 2: modified the design of the table to clear up continuing confusion)

If so, what is this called, and how is it done?

+6  A: 

Actually you can do this in a single pass...

SELECT COUNT(*) AS Total, MAX(Score) AS Score
FROM Posts
WHERE PostTypeId = 1 And Score <= 3
Group By Score

That should give you a nice table like:

Score   Total
0       2490
1       2904
2       2110

Sore thats off the cuff, not in front of a DB right now to verify the syntax. Look up "Group By" to get a better idea.

AS @OMG Ponies points out this is not pivot'ed as you originally had. If you want to have a single column, you would need to use SELECT and SUM to do that.

GrayWizardx
The `MAX(Score)` isn't needed as that's the `group by` field.
Martin Smith
@Gray - For this question you can run queries interactively here by the way http://odata.stackexchange.com/stackoverflow/query/new
Martin Smith
@OMG I forgot to ammend my own post about that. I havent done PIVOT in TSQL (Azure or otherwise). Good point.Your Answer is closer to the desired output. I +1'ed your answer
GrayWizardx
Thx, but the OP clarified that format doesn't have to match.
OMG Ponies
+6  A: 

The output you listed is a standard pivot query, turning rows into columnar data:

SELECT SUM(CASE WHEN p.score = 0 THEN 1 ELSE 0 END) AS ScoreOfZero,
       SUM(CASE WHEN p.score = 1 THEN 1 ELSE 0 END) AS ScoreOfOne,
       SUM(CASE WHEN p.score = 2 THEN 1 ELSE 0 END) AS ScoreOfTwo
  FROM POSTS p
OMG Ponies
This does appear to work (for all posts), but isn't easily expandable to cover all cases from `ScoreOfNegativeFifty` to `ScoreOfThreeThousand`. +1 for enlightening me about `SUM` and `CASE`, though.
Lord Torgamus
(Also, thanks for being so scrupulous about the display format; I didn't intend for it to be followed so exactly, but another virtual +1 to you for it. I've edited to reflect this.)
Lord Torgamus
@Lord Torgamus: That requires [dynamic SQL](http://www.sommarskog.se/dynamic_sql.html)
OMG Ponies
+1  A: 

Example 1: This is just a fleshed out solution based on the original methodology, but provides a row for every Score count from the minimum to the maximum, including zero count scores.

--Posts per score, with zeros.

declare @minScore int
declare @maxScore int
declare @current int
declare @postType int

set @postType = 1

set @minScore = (select MIN(Score) from Posts where PostTypeId = @postType)
set @maxScore = (select MAX(Score) from Posts where PostTypeId = @postType)

set @current = @minScore

create table #myTemp (Score int, PostsCount int) 

insert into #myTemp
select Score, count(*) from Posts group by Score

while @current < @maxScore
begin
  insert into #myTemp
  select @current, 0
    where not exists (select 1 from #myTemp where Score = @current and PostTypeId = @postType)
  set @current = @current + 1
end

select * from #myTemp order by Score
​

Example 2: A cheesy dynamic SQL approach with the Columns as the scores, each column being a sub-query. Disclaimer: SEDE seems to execute it all, but you do not get the results. If you instead select @dynSQL at the end and then run that, you'll get the data.

-- Dynamic SQL count of all posts for a score

declare @minScore int
declare @maxScore int
declare @current int
declare @postType int
declare @dynSQL nvarchar(MAX)

set @postType = 1

set @minScore = (select MIN(Score) from Posts where PostTypeId = @postType)
set @maxScore = (select MAX(Score) from Posts where PostTypeId = @postType)

set @current = @minScore

set @dynSQL = 'select '

while @current <= @maxScore
begin
    set @dynSQL = @dynSQL 
                  + '(select count(*) from Posts where PostTypeId = ' 
                  + convert(varchar,@postType) + ' and Score = ' 
                  + convert(varchar,@current) + ') as Score_'
                  + CASE WHEN @current < 0 THEN 'Negative_' 
                              + convert(varchar,ABS(@current)) 
                         ELSE convert(varchar,@current)
                         END
    if @current < @maxScore set @dynSQL = @dynSQL + ', '
    set @current = @current + 1
end

exec(@dynSQL)
​
Fosco