ansaurus

Question

SQL - suppressing duplicate *adjacent* records

Answer 1

+1 A:

This is not possible with set based commands (i.e. group by and such).

You may be able to do this by using cursors.

Personally, I would get the data into my client application and do the filtering there.

Oded 2010-04-15 13:55:45

Answer 2

+1 A:

The first thing you'd have to do is identify the sequence within which you wish to view/consider the the data. Values of "Jan, Feb, Mar" don't help, because the data's not in alphabetical order. And what happens when you flip from Dec to Jan? Step 1: identify a sequence that uniquely defines each row with regards to your problem.

Next, you have to be able to compare item #x with item #x-1, to see if it has changed. If changed, include; if not changed, exclude. Trivial when using procedural code loops (cursors in SQL), but would you want to use those? They tend not to perform too well.

One SQL-based way to do this is to join the table on itself, with the join clause being "MyTable.SequenceVal = MyTable.SequenceVal - 1". Throw in a comparison, make sure you don't toss the very first row of the set (where there is no x-1), and you're done. Note that performance may suck if the "SequenceVal" is not indexed.

Philip Kelley 2010-04-15 14:13:14

Answer 3

A:

discarding adjacent duplicates but keeping the last row.

why you want keep last row? What is the purpose?

Kate 2010-04-15 14:48:10

I want to know when the data *changed*, not all the possible values of the data. That it changed from 9 to 5 is important, even though it was a 5 back at the start.

Trevel 2010-04-15 15:02:25

Answer 4

+2 A:

Depending on which DB2 you're on, there are analytic functions which can make this problem easy to solve. An example in Oracle is below, but the select syntax appears to be pretty similar.

create table t1 (c1 char, c2 number, c3 date);

insert into t1 VALUES ('A', 5, DATE '2009-01-01');
insert into t1 VALUES ('A', 12, DATE '2009-02-01');
insert into t1 VALUES ('A', 12, DATE '2009-03-01');
insert into t1 VALUES ('A', 12, DATE '2009-04-01');
insert into t1 VALUES ('A', 9, DATE '2009-05-01');
insert into t1 VALUES ('A', 9, DATE '2009-06-01');
insert into t1 VALUES ('A', 5, DATE '2009-07-01');

SQL> l
  1  SELECT C1, C2, C3
  2    FROM (SELECT C1, C2, C3,
  3                 LAG(C2) OVER (PARTITION BY C1 ORDER BY C3) AS PRIOR_C2,
  4                 LEAD(C2) OVER (PARTITION BY C1 ORDER BY C3) AS NEXT_C2
  5            FROM T1
  6         )
  7   WHERE C2 <> PRIOR_C2
  8      OR PRIOR_C2 IS NULL -- to pick up the first value
  9   ORDER BY C1, C3
SQL> /

C         C2 C3
- ---------- -------------------
A          5 2009-01-01 00:00:00
A         12 2009-02-01 00:00:00
A          9 2009-05-01 00:00:00
A          5 2009-07-01 00:00:00

Adam Musch 2010-04-15 19:52:28

Answer 5

A:

Using an "EXCEPT" clause is one way to do it. See below for the solution. I've included all of my test steps here. First, I created a session table (this will go away after I disconnect from my database).

CREATE TABLE session.sample (
   letter CHAR(1),
   number INT,
   update_date DATE
);

Then I imported your sample data:

IMPORT FROM sample.csv OF DEL INSERT INTO session.sample;

Verified that your sample information is in the database:

SELECT * FROM session.sample;

 LETTER NUMBER      UPDATE_DATE
 ------ ----------- -----------
 A                5 01/01/2009
 A               12 02/01/2009
 A               12 03/01/2009
 A               12 04/01/2009
 A                9 05/01/2009
 A                9 06/01/2009
 A                5 07/01/2009

   7 record(s) selected.

I wrote this with an EXCEPT clause, and used the "WITH" to try to make it clearer. Basically, I'm trying to select all rows that have a previous date entry. Then, I exclude all of those rows from a select on the whole table.

WITH rows_with_previous AS (
  SELECT s.*
  FROM session.sample s
  JOIN session.sample s2
    ON s.letter = s2.letter
      AND s.number = s2.number
      AND s.update_date = s2.update_date - 1 MONTH
)
SELECT *
FROM session.sample
EXCEPT ALL
SELECT *
FROM rows_with_previous;

Here is the result:

 LETTER NUMBER      UPDATE_DATE
 ------ ----------- -----------
 A                5 01/01/2009
 A               12 04/01/2009
 A                9 06/01/2009
 A                5 07/01/2009

   4 record(s) selected.

Scott Jones 2010-04-16 04:40:09

You don't have to use the WITH clause here -- it could also be done simply as a subquery in the FROM clause of the second half of the EXCEPT query.

Scott Jones 2010-04-16 04:42:06

ansaurus

tags:

views:

answers:

SQL - suppressing duplicate adjacent records

related questions