tags:

views:

1155

answers:

7

if i have data as follows

A | 01/01/2008 00:00:00

B | 01/01/2008 01:00:00

A | 01/01/2008 11:59:00

C | 02/01/2008 00:00:00

D | 02/01/2008 01:00:00

D | 02/01/2008 20:00:00

I want to only select the records whose identifiers (A, B, C or D) have occured twice within a 12 hour period. In this example above this would only be 'A'

Can anyone help please (this is for an Oracle data base)

Thanks

M

+6  A: 
 Select Distinct A.Identifer  
  From Table A
    Join Table B --  EDIT to eliminate self Joins (to same row)
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer               
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >=  B.OccurTime 
           And A.OccurTime < B.OccurTime + .5

and to implement question asked in comment, (Ignoring records which are on different days)

-- for SQL Server,

   Select Distinct A.Identifer  
    From Table A
      Join Table B 
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >= B.OccurTime  
           And A.OccurTime < B.OccurTime + .5
    Where DateDiff(day, A.OccurTime, B.OccurTime) = 0

-- or for oracle...

 Select Distinct A.Identifer  
    From Table A
      Join Table B 
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >= B.OccurTime  
           And A.OccurTime < B.OccurTime + .5
 Where Trunc(A.OccurTime) = Trunc(B.OccurTime)
Charles Bretana
To avoid returning each duplicate set twice, add: AND A.OccurTime > B.OccurTime
Joel Coehoorn
@Joel, Thx, good catch
Charles Bretana
That will also let you do a simple > rather than a BETWEEN
Joel Coehoorn
I just changed it to between 0 and .5 - should do the same
Charles Bretana
Justin Cave
If the combination of identifier and occurTime is unique, Joel's suggestion of A.OccurTime > B.OccurTime would also work
Justin Cave
thansk for that that exactly what i needed. can i ask for one more thing how can i make it so that if after 12 hours we are into the next day dont return that record.A | 01/01/2008 23:00:00A | 02/01/2008 01:00:00would not be selected
Add AND TRUNC(A.OccurTime) = TRUNC(B.OccurTime)
Dave Costa
@Justin, right, edited to fix...
Charles Bretana
Doesn't this assume that the date column is unique, or unique for a given identifier? Two instances of A at time T would not satisfy the A.time > B.Time and fall through. So long as the data has a unique key, this is fine.
@Mark, true, but easy to fix... I'll edit to correct
Charles Bretana
trunc is not necessary and does not work for times crossing day boundaries still satisfied by the 12hr window.Along similiar lines SQLServers DATEDIFF is also wrong as it does not count the time between dates but rather the number of day boundaries crossed.The greater or equal to should be greater than. Ideally you would use a sequence or internal ROWID in the data to account for any possibility of multiple records being produced at the same time. As it is now /w or without the results are ambiguous for this scenario.
Einstein
A: 
SELECT namecol FROM tbl A
WHERE EXISTS (
  SELECT 1 from tbl B
  WHERE b.namecol = a.namecol
  AND b.timestamp > a.timestamp
  AND b.timestamp - 0.5 <= a.timestamp )
hamishmcn
Clever to include the b.timestamp > a.timestamp predicate to enable indexes. You should have written a note about it, though
erikkallen
Oops, I didn't read the question carefully enough. I think the second comparison is wrong.
erikkallen
Thanks, you're right, I have just tried it out. The >= should be <=, I will make the change now
hamishmcn
+1  A: 
SELECT identifier
  FROM table_name outer
 WHERE EXISTS( SELECT 1
                 FROM table_name inner
                WHERE inner.identifier  = outer.identifier
                  AND inner.date_column BETWEEN outer.date_column AND outer.date_column + interval '12' hour
                  AND inner.rowid != outer.rowid )
Justin Cave
+1  A: 

I'm not 100% sure of your requirements, however this might give you some ideas about how to do what you need. For example you said exactly 2; what if there are 3 occurances? etc.

create table t (ident varchar2(16), occurance timestamp);

insert into t (ident, occurance) values ('a', to_date('20080101000000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('b', to_date('20080101010000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('a', to_date('20080101115900', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('c', to_date('20080102000000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080102010000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080102200000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080103020000', 'yyyymmddhh24miss'));

select ident, occurance
from
(
select ident, occurance,
    lag(occurance) over (partition by ident order by occurance) previous, 
    lead(occurance) over (partition by ident order by occurance) next 
from t
)
where 
    ((occurance-previous<interval'12:00' hour to minute and extract(day from occurance) = extract(day from previous))
    or (next-occurance<interval'12:00' hour to minute and extract(day from occurance) = extract(day from next)))
/
William
+2  A: 
Select    
  A.Id
From    
  YourTable A 
Where    
  A.YourDateTime Between :StartDateTime and :EndDateTime 
Group By    
  A.Id
Having    
  COUNT(A.Id) = 2
dmajkic
"Having Count(A.Id) > 1" if there could be more than 2
StingyJack
He means to say in ANY 12 hour period... not one specific time frame
+2  A: 

I haven't checked William's query but I would seriously consider using what he has over every other. Analytics are da bomb. Anytime you find yourself joining a table back to itself is virtually guaranteed to be an opportunity to use analytics and will out perform the query with one table referenced twice every time.

You'll be amazed how much faster the analytic solution will be.

A: 

I would like to do something similar but I need to check for the number of records inserted by a specific user within a calendar week (Sunday-Saturday) and then lock then out from inserting additional records when they hit their quota.