views:

117

answers:

3

I have an interesting database design problem that I formulated while travelling by a bus, coming back from my home.

Design a normalized database for a bus ticketing system (not reservation system). In each trip, the conductor of the bus will give tickets to its passengers after collecting fare from them. Passengers travel from a various source places to various destination places.

The system must be able to give a report of the places for which the number of passengers was more than 2.

Suppose the stops for the bus are L1,L2, L3 and L4 Suppose passenger P1 travels from L1 to L4. P2 travels from L2 to L4. P3 travels from L3 to L4.

The report should list only (L3-L4) for which it has more than 2 travelers.

Can you please help me to solve the following problems

1) Design a normalized database

2) Write a query for the report

3) Is there any site that gives these kinds of interesting database design questions and answers?

+1  A: 

DataBase Design :

Location Table

 - LocationID  (p)
 - LocationName

TravelTable 

 - TravelID  (p)
 - PassengerID
 - LocationFrom (F) - (Location - LocationID)
 - LocationTo  (F) - (Location - LocationID)

PassengerTable

 - PassengerID (p)
 - PassengerName

Will get back to you soon with query also

Site form where you get proper answer

http://blog.sqlauthority.com/

http://www.sqlservercentral.com/

Pranay Rana
Thanks. One key point. How do you decide the order of locations; like L3 is after L2?
Lijo
Because a bus travels in a straight line and l3 comes after l2. Unless you're going back in which case l3 comes before l2. Or you're teleporting in which case l1 comes after l3 and l2 comes before l4.
Elizabeth Buckwalter
My point here is, we should not be using the order of records in Location table as the order of actual location.
Lijo
you can decide order of the location form locationId and or by adding one more column LocationOrder in location table
Pranay Rana
A: 

The key table would be:

trip (trip_id, passenger_id, start_location_id, end_location_id)

You might have a location table with information about the stops.

Then the query would simply be

select start_location_id, end_location_id, count(*)
from trip
group by start_location_id, end_location_id
having count(*)>=2

Edit

Per comment below, maybe I'm misunderstanding the requirement. Are you trying to find TRIPS that have more than 2 passengers, or LOCATIONS that have more than 2 passengers, or what? That is, if we have trips (Al, L1, L2), (Betty, L1, L2), (Carl, L1, L3), (Donna, L2, L4), should the output be:

L1, L2, 2

(and that's all)?

Or should it be

L1, 3
L2, 3

I believe my query above would give the first result. If you're looking for the second result, it would be:

select location, sum(visits)
from
(
select start_location_id as location, count(*) as visits
from trip
union
select end_location_id as location, count(*) as visits
from trip
)
group by location
having sum(visits)>=2
order by location

As you're adding together counts for two different columns, I don't see any way to avoid using the union and an inner query.

Arguably, a different schema would make this query a whole lot easier, namely, instead of trip have trip_stop:

trip_stop (passenger_id, location_id, stop_number)

where stop_number is, say, 1 for start location and 2 for end location.

Now that I think about it, this is really better, as it eliminates having two fields that represent the same idea, as well as making it easy to expand to having trips with multiple stops.

Then the query simply becomes

select location_id, count(*)
from trip_stop
group by location_id
having count(*)>=2
order by location_id
Jay
I don't think this will meet the requirement. When you group by using start location and end location, each one of them will make separate group. No records will be selected. (TripID PassID StartLoc EndLoc) (1 P1 L1 L4) (1 P2 L2 L4) (1 P3 L3 L4)
Lijo
A: 

This reminds me of liveness analysis from compiler class. There the goal is to find out which variables are in use at the same time so register allocation is more efficient. The other answers cover the database part pretty well. For the algorithm on finding which x (trip, location) have multiple passengers, read up on liveness analysis and linear scan.

Kelly French