views:

200

answers:

6

Continuing with the spirit of using the Stack Exchange Data Explorer to learn SQL, (see: Can we become our own “Northwind” for teaching SQL / databases?), I've decided to try to write a query to answer a simple question (on meta): What % of stackoverflow users have over 10,000 rep?.

Here's what I've done:

Query#1

SELECT COUNT(*)
FROM Users
WHERE
  Users.Reputation >= 10000

Result:

556

Query#2

SELECT COUNT(*)
FROM
  USERS

Result:

227691

Now, how do I put them together into one query? What is this query idiom called? What do I need to write so I can get, say, a one-row three-column result like this:

556     227691      0,00244190592
+7  A: 

You can use a Common Table Expression (CTE):

WITH c1 AS (
    SELECT COUNT(*) AS cnt
    FROM Users
    WHERE Users.Reputation >= 10000
), c2 AS (
    SELECT COUNT(*) AS cnt
    FROM Users
)
SELECT c1.cnt, c2.cnt, CAST(c1.cnt AS FLOAT) / c2.cnt
FROM c1, c2
Mark Byers
Can you make this work on SEDE? Link to the query with result etc? I still get `incorrect syntax near c2` with this.
polygenelubricants
@polygenelubricants: Interesting that it doesn't work. Do you know what the database is? I tested the query in SQL Server 2008.
Mark Byers
http://cloudexchange.cloudapp.net/faq
polygenelubricants
I'm still puzzled why multiple CTEs aren't working in the SEDE.
Daniel Vassallo
@Daniel: I hope this doesn't mean that SEDE is too "broken" for learning =(
polygenelubricants
Oops, I just messed up copying it. JOIN without ON was the problem. The CTEs work fine now that I've fixed the error.
Mark Byers
Works fine now :) ... @polygenelubricants: Now you have three working versions to play with! ... This is the most elegant query IMO, but note that the `WITH` clause is not directly supported in some databases such as MySQL (http://stackoverflow.com/questions/1382573/)
Daniel Vassallo
+3  A: 

Apart from using CTEs, in this case you could also have done:

SELECT CAST((SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000) AS float)  /
       (SELECT COUNT(*) FROM USERS) * 100  AS Percentage​

The cast as float was to force a floating-point division, because with integer division 556 / 227691 would give 0.

Daniel Vassallo
It was awesome to see you work on SEDE's recent queries within the past 10 minutes =)
polygenelubricants
@polygenelubricants: lol, I wasn't aware that queries were viewable in the recent tab :) ... Good to know!
Daniel Vassallo
+1 for remembering cast! He probably wants all three values on a single row though.
Mark Byers
@Mark: True. That wouldn't look nice with my approach!
Daniel Vassallo
+2  A: 
WITH tmp as (
SELECT COUNT(ID) AS repCount, (SELECT COUNT(ID) FROM Users ) AS totalCount
FROM Users
WHERE Users.Reputation > 10000
)
SELECT tmp.repCount, tmp.totalCount, (cast(tmp.repCount as decimal(10,2))/tmp.TotalCount) * 100 AS Percentage
FROM tmp

UPDATED: without the with

SELECT COUNT(ID) AS repCount, (SELECT COUNT(ID) FROM Users ) AS totalCount, 
    (CAST((SELECT COUNT(ID) FROM Users WHERE Users.Reputation > 10000) AS DECIMAL(10,2)) /
        (SELECT COUNT(ID) FROM Users )) * 100 AS Persantage
FROM Users
Teddy
Can you make this work on SEDE with link to query and result? I get various errors.
polygenelubricants
@polygenelubricants: http://cloudexchange.cloudapp.net/stackoverflow/q/1746
Daniel Vassallo
The decimal type was a bit too small so it was returning an overflow error. And `UserID` had to be `Id`
Daniel Vassallo
I updated the answer.
Teddy
@Daniel thanks for the correction.
Teddy
+2  A: 

Using variables in MySQL:

SELECT @a:=(SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000),
       @b:=(SELECT COUNT(*) FROM Users),
       IF(@b > 0, @a/@b, "--invalid--")
FROM Users
LIMIT 0,1
Anax
+2  A: 

Thanks to the other answers here, I've written the following queries, all of which work on SEDE:

"Inline view"

SELECT *, CAST([10K] AS FLOAT)/[All] AS [Ratio]
FROM (
   SELECT
    (SELECT COUNT(*) FROM Users) AS [All],
    (SELECT COUNT(*) FROM Users Where Reputation >= 10000) AS [10K]
) AS UsersCount

(See query result)


Variables

DECLARE @numAll FLOAT
DECLARE @num10kers FLOAT

SET @numAll = (SELECT COUNT(*) FROM Users)
SET @num10kers = (SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000);

SELECT  @num10kers AS [10K], @numAll AS [All], @num10Kers/@numAll AS [Ratio]

(See query result)

References


Common Table Expression

WITH Users10K AS ( 
    SELECT COUNT(*) AS Count
    FROM Users
    WHERE Users.Reputation >= 10000
), UsersAll AS (
    SELECT COUNT(*) As Count
    FROM Users
)
SELECT
    Users10K.Count AS [10K],
    UsersAll.Count AS [All],
    CAST(Users10K.Count AS FLOAT) / UsersAll.Count AS [Ratio]
FROM Users10K, UsersAll

(See query result)

References

polygenelubricants
+2  A: 

For queries like this, where I'm doing multiple counts on a single table based on different criteria, I like to use SUM and CASE:

SELECT
    UsersCount.[10K],
    UsersCount.[All],
    (CAST(UsersCount.[10K] AS FLOAT) / UsersCount.[All]) AS [Ratio]
FROM
    (SELECT
         SUM(CASE
               WHEN Users.Reputation >= 10000 THEN 1
               ELSE 0
             END) AS [10K],
         COUNT(*) AS [All]
     FROM Users) AS UsersCount

(query results)

The advantage is that you're only scanning the Users table once, which may be significantly faster.

Cheran S
Yep, I saw this one being made in the recent queries =)
polygenelubricants