I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.
I'm trying to query the Badges
table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:
- Badges
- Id
- UserId
- Name
- Date
This works well for badges like [Epic]
and [Legendary]
which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.
Here's an example query I wrote for [mysql]
tag:
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
The (slightly annotated) output is: as seen on stexdex:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql]
tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.
So this is the way I understand it:
- The first time an
Id
appears (in chronological order) is for the silver badge - The second time is for the gold
Now, the above result mixes the silver and gold entries together. My questions are:
- Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
- In the current design, how would you query the silver and gold badges separately?
GROUP BY Id
and picking the min/max or first/second by theDate
somehow?- How can you write a query that lists all the silver badges first then all the gold badges next?
- Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
- How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
- Is it perhaps more typical to do two totally separate queries instead?
- What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
Requirement clarification
Originally I wanted the following output, essentially:
User Link Date
--------------- -------------------
Bill Karwin 2009-02-20 11:00:25 // result of query for silver
Quassnoi 2009-06-01 10:00:16 // :
Greg 2009-10-22 10:00:25 // :
cletus 2010-01-01 11:00:23 // :
OMG Ponies 2010-01-03 11:00:48 // :
Pascal MARTIN 2010-02-17 11:00:29 // :
Mark Byers 2010-04-07 10:00:35 // :
Daniel Vassallo 2010-05-14 10:00:38 // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi 2009-10-31 10:00:24 // result of query for gold
Bill Karwin 2009-11-23 11:00:30 // :
But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.