tags:

views:

53

answers:

2

I have some data that needs to be validated, and while I have found a quick formula that works in excel I would like to find some SQL that would do the same so I can use it in my database as a check now and then.

The data is simply this:

ACACIA ST (KLN) | 1073 | 1149 | FAIL! | 76
ACACIA ST (KLN) | 1073 | 1151 | FAIL! | 78
ACACIA ST (KLN) | 1150 | 1332 | FAIL! | 182
ACACIA ST (KLN) | 1151 | 1332 | FAIL! | 181
ACACIA ST (KLN) | 1606 | 2079 |  | 473
ACTIVITY ST (WRK) | 6 | 215 | WIN! | 209
ACTIVITY ST (WRK) | 215 | 227 | WIN! | 12
ACTIVITY ST (WRK) | 227 | 423 | WIN! | 196
ACTIVITY ST (WRK) | 423 | 549 | WIN! | 126
ACTIVITY ST (WRK) | 549 | 600 |  | 51
ADRIENNE CT (WMN) | 5 | 107 | WIN! | 102
ADRIENNE CT (WMN) | 107 | 122 |  | 15
AERODROME RD (LYB) | 0 | 140 | WIN! | 140
AERODROME RD (LYB) | 140 | 428 | WIN! | 288
AERODROME RD (LYB) | 428 | 716 | WIN! | 288
AERODROME RD (LYB) | 716 | 998 | WIN! | 282
AERODROME RD (LYB) | 998 | 1280 | WIN! | 282
AERODROME RD (LYB) | 1280 | 1566 | WIN! | 286
AERODROME RD (LYB) | 1566 | 1851 | WIN! | 285
AERODROME RD (LYB) | 1851 | 2136 | WIN! | 285
AERODROME RD (LYB) | 2136 | 2421 |  | 285

Well there is a lot more data then that, ~11000+ rows of it.

Now the idea is that each road segment (the first column) has a start and end chainage (second and third column) and a length (last column), the end chainage should be the start chainage of the next segment, provided the road segment name is the same (sorted by start chainage of course). You can see that because the first rows end chainage doesn't equal the start of the next it fails.

My bad attempt at pseudocode:

If EndChainage != NextStartChainge Where RoadSegment = NextRoadSegment Error

What I need to be able to do select all the road segments which don't follow the rule of end chainage = start chainage of next.

So the outout from the above table would be:

ACACIA ST (KLN) | 1073 | 1149 | FAIL! | 76
ACACIA ST (KLN) | 1073 | 1151 | FAIL! | 78
ACACIA ST (KLN) | 1150 | 1332 | FAIL! | 182
ACACIA ST (KLN) | 1151 | 1332 | FAIL! | 181

Hope that makes sense. I attempted it but just could seem to get it to work, and my SQL skills are a bit lacking.

EDIT: I can use both SQL Server and PostgreSQL.

+2  A: 

My first attempt was simple enough but doesn't quite get it right if you have the ACACIA ST situation when a correct segment is not in the right sequential order.

    select * from stackchallenge se
    where not exists (select * from stackchallenge 
where startchain = se.endchain and segment = se.segment)
    and exists (select * from stackchallenge 
where segment = se.segment and endchain > se.endchain)

The second effort uses some SQL 2005 specific syntax but I think gets the right answer you're after.

     with segments (segment,startchain,endchain,lenchain,segnum)
        as (select segment,startchain,endchain,lenchain
    ,row_number() over (partition by segment order by startchain,endchain) segnum
                from stackchallenge)
        select s1.segment,s1.startchain,s1.endchain,s1.lenchain
            ,case when s1.endchain = s2.startchain then 'WIN!'
                when s1.endchain <> s2.startchain then 'FAIL!'
            else '' end result
        from segments s1
            left join segments s2 on s1.segment = s2.segment 
and s1.segnum + 1 = s2.segnum

Here is the table and data used to test which you'll need SQL Server 2008 to use directly. Change the Insert to SQL 2005 syntax if you are using that.

create table stackchallenge (segment varchar(100),
    startchain int,
    endchain int,
    result varchar(100),
    lenchain int
)   

insert into stackchallenge (segment,startchain,endchain,result,lenchain)
values 
('ACACIA ST (KLN)' , 1073 , 1149 , 'FAIL!' , 76),
('ACACIA ST (KLN)' , 1073 , 1151 , 'FAIL!' , 78),
('ACACIA ST (KLN)' , 1150 , 1332 , 'FAIL!' , 182),
('ACACIA ST (KLN)' , 1151 , 1332 , 'FAIL!' , 181),
('ACACIA ST (KLN)' , 1606 , 2079 , '' , 473),
('ACTIVITY ST (WRK)' , 6 , 215 , 'WIN!' , 209),
('ACTIVITY ST (WRK)' , 215 , 227 , 'WIN!' , 12),
('ACTIVITY ST (WRK)' , 227 , 423 , 'WIN!' , 196),
('ACTIVITY ST (WRK)' , 423 , 549 , 'WIN!' , 126),
('ACTIVITY ST (WRK)' , 549 , 600 , '' , 51),
('ADRIENNE CT (WMN)' , 5 , 107 , 'WIN!' , 102),
('ADRIENNE CT (WMN)' , 107 , 122 , '' , 15),
('AERODROME RD (LYB)' , 0 , 140 , 'WIN!' , 140),
('AERODROME RD (LYB)' , 140 , 428 , 'WIN!' , 288),
('AERODROME RD (LYB)' , 428 , 716 , 'WIN!' , 288),
('AERODROME RD (LYB)' , 716 , 998 , 'WIN!' , 282),
('AERODROME RD (LYB)' , 998 , 1280 , 'WIN!' , 282),
('AERODROME RD (LYB)' , 1280 , 1566 , 'WIN!' , 286),
('AERODROME RD (LYB)' , 1566 , 1851 , 'WIN!' , 285),
('AERODROME RD (LYB)' , 1851 , 2136 , 'WIN!' , 285),
('AERODROME RD (LYB)' , 2136 , 2421 , '' , 285)
Craig
+2  A: 

Assuming that there won't be duplicate rows, the following should be a reasonably DB-agnostic solution:

select street, startno, endno, 
case  (select coalesce(min(s2.startno),-1) from stackchallenge s2 
       where s1.street = s2.street and 
             s1.startno <= s2.startno and 
             s1.endno < s2.endno)
    when -1 then ' '
    when endno then 'WIN!'
    else 'FAIL!'
end as validated,
length
from stackchallenge s1 order by 1,2,3
Mark Bannister