views:

196

answers:

3

Let's say I have the following table in SQL Server 2005:

id | Date | Letter
-----------------------------
01 | 5/1/2009 | W
02 | 5/1/2009 | X
03 | 5/1/2009 | Y
04 | 5/1/2009 | Z
05 | 5/1/2009 | A
06 | 5/1/2009 | B
07 | 5/1/2009 | D
08 | 5/1/2009 | E
09 | 5/2/2009 | W
10 | 5/2/2009 | X
11 | 5/2/2009 | Y
12 | 5/2/2009 | Z
13 | 5/2/2009 | A
14 | 5/2/2009 | B
15 | 5/2/2009 | C
16 | 5/2/2009 | D
17 | 5/3/2009 | W
18 | 5/3/2009 | X

In my previous question, I asked how to figure out which letter sequence was missing. In my example it would letter C between row 6 and 7. I got the following answer and it works for the most part:

SELECT * from TABLE1 t1 
LEFT OUTER JOIN TABLE2 t2 
  ON ASCII(t2.INTCol) - 1 = ASCII(t2.INTCol) 
  AND t1.date = t2.date
  WHERE t2.id IS NULL

However, this solution does have two problems. It will return false positives in two cases.

  1. The letter sequence starts over at 'W' everyday. So, I get a false positive result for the last record of every day. That would be row 8 and 16 in my example data.

  2. Since the sequence starts over at 'W' everyday, I get a false positive for the 'Z' record because ('Z' + 1) gives me a '[', not an 'A'.

For #2 I believe I can put in an "IF" statement that says: "If this is a Z, look for an A" but I'm open to any 'cleaner' solutions.

I really have no idea how to do #1, so I'm really looking for help on that one. EDIT: Something I left out, doing a MAX(t1.id) isn't going to work, as the rows aren't always inserted in sequence. In fact, looking at my data, it's always inserted in [a-z] order even though the first record is the 'w'. I know it's lame, but I'm dealing with log files from dozens of million dollar phone switches. Not much I can do about it.

+1  A: 

Rollover: use mod.

Days: add a predicate that excludes the final entry of each day.

create view missing_letter as 
SELECT *  from TABLE1 t1 
LEFT OUTER JOIN TABLE2 t2 
  ON t1.id = t2.id 
  AND t1.date = t2.date
  and mod( ascii(t1.letter) - ascii('A'), ascii('Z') + 1 ) 
  <> mod( ascii(t2.letter) - ascii('A'), ascii('Z') + 1 ) - 1 
  and t1.id < (select max(t3.id) from table t3 where t3.date = t1.date)
;

A better solution would be to come up with an auxiliary table of ordered allowed letters, and find all the letters in that auxiliary table that are missing ("... where not exists ...") where any of those letters comes before the highest letter in your table.

tpdi
If your table gets significantly large, using an auxilliary table would be faster than the LOJ.
Philip Kelley
A: 

Looking at data, you could do a MAX on ID for each row to get the last character in the sequence.

If no records, you can insert W
If Z then insert A
Else add the character after the current character

Is that what you are looking for?

shahkalpesh
+1  A: 

How's this?

SELECT * FROM 
(
SELECT CASE
  WHEN t1.date = t2.date
    AND (ASCII(t2.IntCol) = 90 AND ASCII(t1.IntCol) <> 65)
     THEN 1
  WHEN t1.date = t2.date
    AND ASCII(t2.IntCol) <> ASCII(t1.IntCol) + 1
     THEN 1
  ELSE 0
END AS IsMissing, t1.*

from TABLE1 t1 
LEFT OUTER JOIN TABLE1 t2   
  ON t1.id +1 = t2.id
) as a WHERE a.IsMissing = 1
Aaron Alton