views:

130

answers:

2

So I need to pull the e-mail addresses of members who have not logged into my site in the past 30 days.

Since the site is over a few years old, we want to pull only a handful of each member from certain Quarters of the year.

So for instance, there are about 800,000 people who haven't logged in within the past 30 days.

So we want to pull 300,000 users from that list, but we want to split it the sample over three years. Three years has twelve quarters. 300,000 /12 = 25,000.

So I want to pull 25,000 from Q1 2007

25,000 from Q2 2007

25,000 from Q3 2007

25,000 from Q4 2007

25,000 from Q1 2008

etc....

Would I need to write queries for each 'quertar" then union them or is there a way to do thsi all within one query?

SELECT Email, lastlogindate
FROM Users
WHERE DATEDIFF(dd, LastLoginDate, GetDate()) > 30

That's the basic query I have to pull all the data.

I basically want to avoid having to write 12 queries with where clauses like:

DATEPART(q, LastLoginDate) = 1 AND
DATEPART(yyyy, LastLoginDate) = 2007

any help would be greatly appreciated.

+1  A: 

You can use ranking function(s) to achieve your goal:

  1. Calculate 'quarter number' for each row (so quarters from you time span are numbered with consecutive values).
  2. Group you rows based on the 'quarter number' value
  3. Calculate rank of each row in the group
  4. Filter records based on rank value

You're done.

AlexS
+5  A: 

try something like this:

SELECT 
    Email,LastLoginDate
    FROM (SELECT
              Email,LastLoginDate
                  ,ROW_NUMBER() OVER(PARTITION BY DATEPART(year,LastLoginDate),DATEPART(quarter,LastLoginDate) ORDER BY LastLoginDate) AS Rank
              FROM Users
              WHERE LastLoginDate < GetDate()-30
                  AND LastLoginDate > GetDate()-(3*365)
         ) dt
    WHERE Rank<25000

EDIT
changed the WHERE in derived table to better use an index on LastLoginDate

to check that everything is working, you can add these to the outer SELECT list:

,DATEPART(year,lastchgdate) as Year,DATEPART(quarter,lastchgdate) AS Quarter,rank

and add an ORDER BY:

ORDER BY Year,Quarter,rank
KM
Vote up: this is what I meant but was lazy to type sql.
AlexS
I love this damn site. thanks a ton!
Jack Marchetti
I actually used it prior to your edit. I was wondering, why do the LastLoginDate > GetDate()-(3*365) as opposed to doing DATEPART(yyyy, LastLoginDate) <= 3
Jack Marchetti
if you run the query with the _WHERE_ is the same as in my last edit, the query can use an index on LastLoginDate. when you wrap a column in a function, you can't use the index on it anymore. This would only matter if you have an index on LastLoginDate, and if not you should add one if you want this query to run fast.
KM
it's just a one off query, so no big deal on speed.
Jack Marchetti
@JackM said _it's just a one off query, so no big deal on speed._ but you can still use that technique to make sure that hit and index on your other queries
KM