views:

299

answers:

4

The question is how long have these customers been jerks on any given date.

I'm working against Sybase

For this simplified table structure of table history_data

table: history_of_jerkiness
processing_date  name  is_jerk
---------------  ----- -------
20090101         Matt  true
20090101         Bob   false        
20090101         Alex  true        
20090101         Carol true        
20090102         Matt  true        
20090102         Bob   true        
20090102         Alex  false        
20090102         Carol true        
20090103         Matt  true        
20090103         Bob   true        
20090103         Alex  true        
20090103         Carol false        

The report for the 3rd should show that Matt has always been a jerk, Alex has just become a jerk, and Bob has been a jerk for 2 days.

name    days jerky
-----   ----------
Matt    3
Bob     2
Alex    1

I'd like to find these spans of time dynamically, so if I run the report for the 2nd, I should get different results:

name    days_jerky
-----   ----------
Matt    2
Bob     1
Carol   2

The key here is trying to find only continuous spans older than a certain date. I've found a few leads, but it seems like a problem where there would be very smart tricky solutions.

+1  A: 

This can be made simple if you structure the data so as to meet the following criteria...

All people must have an initial record where they are not a jerk

You can do something like...

SELECT
   name,
   MAX(date)   last_day_jerk_free
FROM
   jerkiness AS [data]
WHERE
   jerk = 'false'
   AND date <= 'a date'
GROUP BY
   name

You already know what the base date is ('a date'), and now you know the last day they were not a jerk. I don't know sybase but I'm sure there are commands you can use to get the number of days between 'a data' and 'last_day_jerk_free'

EDIT:

There are multiple ways of artificially creating an initialising "not jerky" record. The one suggested by Will Rickards uses a sub query containing a union. Doing so, however, has two down sides...
1. The sub-query masks any indexes which may otherwise have been used
2. It assumes all people have data starting from the same point

Alternatively, take Will Rickard's suggestion and move the aggregation from the outer query into the inner query (so maximising use of indexes), and union with a generalised 2nd sub query to create the starting jerky = false record...

SELECT name, DATEDIFF(day, MAX(processing_date), @run_date) AS days_jerky
FROM (

    SELECT name, MAX(processing_date) as processing_date
    FROM history_of_jerkiness
    WHERE is_jerk = 0 AND processing_date <= @run_date
    GROUP BY name

    UNION

    SELECT name, DATEADD(DAY, -1, MIN(processing_date))
    FROM history_of_jerkiness
    WHERE processing_date <= @run_date
    GROUP BY name

    ) as data
GROUP BY
   name

The outer query still has to do a max without indexes, but over a reduced number of records (2 per name, rather than n per name). The number of records is also reduced by not requiring every name to have a value for every date in use. There are many other ways of doing this, some can be seen in my edit history.

Dems
+2  A: 

My solution from SQL Server - same as Dems but I put in a min baseline myself. It assumes there are no gaps - that is there is an entry for each day for each person. If that isn't true then I'd have to loop.

DECLARE @run_date datetime
DECLARE @min_date datetime

SET @run_date = {d '2009-01-03'}

-- get day before any entries in the table to use as a false baseline date
SELECT @min_date = DATEADD(day, -1, MIN(processing_date)) FROM history_of_jerkiness

-- get last not a jerk date for each name that is before or on the run date
-- the difference in days between the run date and the last not a jerk date is the number of days as a jerk
SELECT [name], DATEDIFF(day, MAX(processing_date), @run_date)
FROM (
     SELECT processing_date, [name], is_jerk
     FROM history_of_jerkiness
     UNION ALL
     SELECT DISTINCT @min_date, [name], 0
     FROM history_of_jerkiness ) as data
WHERE is_jerk = 0
  AND processing_date <= @run_date
GROUP BY [name]
HAVING DATEDIFF(day, MAX(processing_date), @run_date) > 0

I created the test table with the following:

CREATE TABLE history_of_jerkiness (processing_date datetime, [name] varchar(20), is_jerk bit)

INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-01'}, 'Matt', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-01'}, 'Bob', 0)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-01'}, 'Alex', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-01'}, 'Carol', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-02'}, 'Matt', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-02'}, 'Bob', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-02'}, 'Alex', 0)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-02'}, 'Carol', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-03'}, 'Matt', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-03'}, 'Bob', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-03'}, 'Alex', 1)
INSERT INTO history_of_jerkiness (processing_date, [name], is_jerk) VALUES ({d '2009-01-03'}, 'Carol', 0)
Will Rickards
A: 

How about this:

select a.name,count(*) from history_of_jerkiness a
left join history_of_jerkiness b
on a.name = b.name 
and a.processing_date >= b.processing_date
and a.is_jerk = 'true'
where not exists
( select * from history_of_jerkiness c
  where a.name = c.name
  and c.processing_date between a.processing_date and b.processing_date
  and c.is_jerk = 'false'
)
and a.processing_date <= :a_certain_date;
Carlos A. Ibarra
I tested in sql server and it doesn't work. I think I get what you are trying though. I tried to fix it but didn't have time to finish.
Will Rickards
+1  A: 

"This can be made simple if you structure the data so as to meet the following criteria...

All people must have an initial record where they are not a jerk"

What criteria the data should and should not meet is up to the user, not to the developer.

but the user isn't usually certain. A little negotiation around what they really need can vastly simplify problems.
MattK