tags:

views:

13

answers:

2

Consider the table below:

Col1 Col2 Col3

123  ABC  20/5/2010
123  CDS  21/5/2010
123  VDS  22/5/2010
123  ABC  23/5/2010
123  VDS  24/5/2010
123  CDS  25/5/2010
123  ABC  26/5/2010

I need to fetch the first occurrence of CDS and calculate the time diff between the next row. Similarly I need to find out the next occurrence of CDS and calculate the time diff with the next row. This has to go on until there are no occurrences of CDS left in the table.

Will be grateful if someone can help on this!!

A: 

With the sample data, there is but one pair of rows with the value CDS in Col2, so there is but one row in the output. It is not clear what you'd expect if there were 4 rows with CDS. Your wording might be intended to imply that the first pair would contribute one row and the second pair would contribute a second row. Or it might be that you need to find the differences between consecutive occurrences of CDS, so that the 4 rows of data would produce 3 rows of output. (The question also leaves it open to discussion whether this applies within a single value for Col1, or whether the Col1 is immaterial to the result.)

Since the ambiguity exists, I'll tackle the second option, assuming that the CDS entries all have to be for the same value in Col1 (but that there may be many different values in Col1).

You don't mention which version of Informix you have; I am assuming IDS 11.50. The syntax may not work in earlier versions.

As so often, the table is anonymous in the question - so it is hereby designated Tab1.

Query

SELECT t1a.col1, t1a.col2, t1a.col3, t1b.col3 AS col4,
       t1b.col3 - t1a.col3 AS delta
  FROM tab1 AS t1a JOIN tab1 AS t1b
    ON t1a.col1 = t1b.col1 AND
       t1a.col3 < t1b.col3 AND
       t1a.col2 = t1b.col2 AND
       t1a.col2 = 'CDS'    AND
       NOT EXISTS(SELECT *
                    FROM tab1 AS t1c
                   WHERE t1c.col3 > t1a.col3 AND
                         t1c.col3 < t1b.col3 AND
                         t1c.col1 = t1a.col1 AND
                         t1c.col2 = t1a.col2
                  );

The '<' join orders the pairs of dates so the 't1a' value is less than the 't1b' value; the NOT EXISTS clause ensures that the are adjacent to each other by asserting that there is no row in tab1 with the same values for col1 and col2 and with a date that comes after the earlier date and before the later date. This is the crucial part of the query.

Result

   col1     col2     col3            col4            delta
   123      CDS      21/05/2010      25/05/2010      4
   123      CDS      14/04/2010      22/04/2010      8
   123      CDS      22/04/2010      21/05/2010      29
   120      CDS      11/05/2010      16/05/2010      5
   121      CDS      21/04/2010      30/04/2010      9

Schema

CREATE TABLE tab1 (col1 SMALLINT, col2 CHAR(3), col3 DATE);

Data

Running with DBDATE=DMY4/.

INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '20/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '21/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '22/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '23/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '24/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '25/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '26/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '10/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '14/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '12/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '13/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '19/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '22/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '16/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'ABC', '10/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'CDS', '11/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'VDS', '12/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'ABC', '13/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'VDS', '14/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'CDS', '16/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '17/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'CDS', '21/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '22/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'VDS', '23/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '24/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'VDS', '25/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '26/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '27/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '28/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '29/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'CDS', '30/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'ABC', '23/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'VDS', '24/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'CDS', '25/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'ABC', '26/4/2010');
Jonathan Leffler
A: 

What is your desired output? Is it something like:

123 ABC 20/5/2010 
123 CDS 21/5/2010 
123 VDS 22/5/2010 1 day, 0:00:00
123 ABC 23/5/2010 
123 VDS 24/5/2010 
123 CDS 25/5/2010 
123 ABC 26/5/2010 1 day, 0:00:00

If so, then I think the simplest way to get it is: create program in Python or similar language, select your data using SQL and calculate date diff with language of your choice. In "normal" SQL there is no such thing like "next row", where in other languages you can save date of last CDS and use it with next loop iteration.

This output was created with Python:

import time
import datetime

TXT = """123 ABC 20/5/2010
123 CDS 21/5/2010
123 VDS 22/5/2010
123 ABC 23/5/2010
123 VDS 24/5/2010
123 CDS 25/5/2010
123 ABC 26/5/2010"""

def txt2time(ts):
    tpl = time.strptime(ts, '%d/%m/%Y')
    return time.mktime(tpl) 

last_date = ''
for line in TXT.split('\n'):
    date_diff = ''
    arr = line.split()
    if last_date:
        date_diff = '%s' % (datetime.timedelta(seconds = (txt2time(arr[2]) - txt2time(last_date))))
        last_date = ''
    if arr[1] == 'CDS':
        last_date = arr[2]
    print('%s %s' % (line.strip(), date_diff))

As you see I iterate over text lines, but you can easily change first loop with split('\n') to loop on recordset:

for row in cursor.fetchall():
  if row[0] == 'CDS':
      ...

(you can find Python/Jython examples on many web pages including my questions and answers on SO).

I think it is possible to find such solution in SQL only. You will need function that return date from next row. And I think this may not be easy to create such function because such function will have to behave just like your select with filtering and ordering.

Michał Niklas