views:

77

answers:

2

The problem was created by me, discussed with my co-workers and no one seem to have an idea. Thus, thought, I can ask the experts here.

I've the following table FlightInfo and the fields are

 Start
 Destination
 Flight_Duration 

The goal is to find out, the shortest flight between two cities. The challenge is not all cities have direct flights. (Example: PHL to PVA ->Philadelphia to Shangai). You have to connect in Detroit (DTW) or Chicago (ORD)

How will you go about writing an SQL statement?

Example table contents

PHL DTW 2.25

DTW PVG 15.15

PHL ORD 3.15

ORD PVG 16.20

+1  A: 

I would make an assumption that you may get from any single airport to any another one in at most three flights. It is possible that it won't be true in few very exceptional cases, but is this really a problem? I don't think so, however you may consider adding another join if you feel you need it.

Table flights:
origin      VARCHAR
destination VARCHAR
start       DATETIME
end         DATETIME

And query:

SELECT *, f3.end - f1.end AS duration
FROM flights AS f1
INNER JOIN flights AS f2
ON f1.destination = f2.origin AND f1.end < f2.start
INNER JOIN flights AS f3
ON f2.destination = f3.origin AND f2.end < f3.start
WHERE f1.start BETWEEN some_reasonable_values_not_to_have_too_many_rows_in_join
  AND f2.start BETWEEN some_reasonable_values_not_to_have_too_many_rows_in_join
  AND f3.start BETWEEN some_reasonable_values_not_to_have_too_many_rows_in_join
  AND f1.origin = your_desired_origin
  AND f3.destination = your_desired_destination
ORDER BY duration ASC
LIMIT 1

This is for combination of three flights. Similar SQL for two flights and one flight (less joins). Then, union those three queries and take the best result.

You may want to add some minimal delay between flights. Some_reasonable_values_not_to_have_too_many_rows_in_join - does it make sense to search flight combinations that take longer than e.g. three days?

skalee
+2  A: 

Brute force:

declare @t table (
 [from] char(3), 
 [to] char(3),
 [time] float);

 insert into @t 
    ([from], [to], [time])
values  
 ('PHL', 'DTW', 2.25),
 ('DTW', 'PVG', 15.15),
 ('PHL', 'ORD', 3.15),
 ('ORD', 'PVG', 16.20);

 declare @src char(3) = 'PHL',
   @dst char(3) = 'PVG';

with cteAnchor as (
select case @src 
    when [from] then [to] 
    when [to] then [from]
    end as [layover], [time]
    , [time] as [total]
    , cast([from]+'-'+[to] as varchar(max)) as [path]
    , 1 as [flights]
from @t
where @src in ([from], [to]))
, cteRecursive as (
select [layover], [time], [total], [path], [flights]
from cteAnchor
union all
select case r.layover
    when [from] then [to]
    when [to] then [from] 
    end as [layover]
    , t.[time]
    , t.[time] + r.[total] as [total]
    , r.[path] + ' ' +t.[from]+'-'+t.[to] as [path]
    , r.[flights] + 1
from @t t
join cteRecursive r
    on  (t.[from] = r.[layover] and 0 = charindex(t.[to], r.[path]))
        or 
        (t.[to] = r.[layover] and 0 = charindex(t.[from], r.[path]))
)
select top(1) [flights], [total], [path] from cteRecursive  
where @dst = [layover]
order by [total] asc;

Answer:

total   path
17.4    PHL-DTW DTW-PVG

Edit note: I have modified the implementation of the CTE with one which is resilient to cycles and is also actually correct.

Remus Rusanu
Your performance mileage may vary (as in 'don't do this in a real case'). This is more of an academic exercise, much like the question.
Remus Rusanu
Thanks Remus Rusanu. Appreciate it.