views:

668

answers:

3

Its been a while since my Database Design classes in my sophomore year at Uni. and I haven't done any designs in the interim so my skills are at best rusty at the moment. I have begun working on a personal project involving the railway timetable system and seem to be stuck at the table design which resembles something like this -

StationTbl
------------
  StnName
  StnCity
  StnCode -  {Primary Key}

TrainTbl
---------
  TrnName
  TrnNumber -  {Primary Key}
  SourceStn
  DestStn
  DaysofWeek 

TrainHopTbl
--------------
  TrnNumber   -  {Primary Key}
  StationCode -  {Primary Key}
  ArrTime
  DepTime
  HopIndex

Most fields are alphanumberic with the exception of the Time fields and the HopIndex in TrainHopTbl. As you can see the preliminary design is very crude and far from finished.

Users will be able to find trains based on either the train name/number or by specifying the source and destination station. The first query can be easily dealt with but I am having problems writing a query for the second search where the user gives the src/dest pair and the server returns a list of trains which run on that route. This information will be extracted from TrainHopTbl which contains the list of hops for the particular train, like so -

TrainHopTbl
--------------
Num StnCode  ArrTime  DepTime  HopIndex
121  WDC     0900      0910        1
121  BAL     1005      1010        2
121  NYC     1145       -          3

If the user enters WDC/NYC as the src/dest pair then the query should return Train Number 121 since it is a valid route.

Any pointers/links/book suggestions on database design would be helpful. Heck, at this point even runnable queries or entire re-designs would be helpful since I seem to be stuck in a rut that I am finding hard to get out of and this has completely stalled my progress.

A: 

It seems like you are trying to solve a hard graph problem with the database. It might be much easier to add a field to the train table that stores the list of stops in a string form

"WDC, BAL, NYC"

Then you just need to find trains that contain the two substrings that you are looking for, in this case "WDC" and "NYC". This narrows down your search a lot, to the point where you could consider the resulting trains in code outside of SQL.

Without doing more research than I am willing to do right now, what you do would then do is

SELECT from where contains "WDC" AND contains "NYC"

I don't know the best way of doing Contains ... comments anyone?

Tom Leys
What if a train has 20 stops? Think about even the commuter rail in Boston--that would be fairly inefficient, eh? Why not use set transactions instead of trying to parse strings in SQL? So long as it's the same Train # and the HopIndex is in the correct order, life should be peachy keen (not taking into account transfers).
Eric
I guess I am too busy trying to multitask here... !
Tom Leys
Eric, I am sorry - I lost you at "set transactions". Care to elaborate?
muteW
+3  A: 

I'd take your SourceStn and DestStn out of your TrainTbl -- it's needless clutter.

Anyway, you can get what you're looking for with:

select 
    src.TrnNumber,
    srcSt.StnName as SourceStation, 
    srcSt.StnCity as SourceCity,
    src.DepTime,
    destSt.StnName as DestinationStation,
    destSt.StnCity as DestinationCity,
    dest.ArrTime,
    (abs(dest.HopIndex - src.HopIndex)) as Stops
from
    TrainHopTbl src
    inner join TrainHopTbl dest on
        src.TrnNumber = dest.TrnNumber
    inner join StationTbl srcSt on
        src.StnCode = srcSt.StationCode
    inner join StationTbl destSt on
        dest.StnCode = destSt.StationCode
where
    src.StnCode = 'WDC'
    and dest.StnCode = 'NYC'
    and src.HopIndex < dest.HopIndex
order by
    Stops asc,
    DepTime asc

Edit: I haven't taken into account transfers here. Your question mentioned just straight route trains. Let me know if you want transfers, as well.

Eric
+1 nice solution. Narrows down routes well, will cache nicely, ensures returned trains go in the right dir.
Tom Leys
May I suggest a ORDER BY hopindex / some other indication of travel time?
Tom Leys
That's a good addition. If you stored ArrTime and DepTime as times, you could use datediff to find out the total travel time. For the time being, I'll order it by # of stops and departure time.
Eric
Currently I am looking only at straight route trains. Transfers are optional and not necessary at the moment. I am just trying to formalize a decent db design which won't mess up my code later in the project.
muteW
+2  A: 

I haven't thought this through at all yet, so this response could be way off.

I think the TrainHopTbl records the nodes in a network, where it would be more useful to record the edges in a network. An edge would have a train number, a departure station, a departure time, an arrival station, and an arrival time. And maybe a hop index like the you have.

So,

Num: 121, Hopindex: 1, DepStnCode: WDC, DepTime: 910, ArrStnCode: BAL, ArrTime: 1005

Would describe the "hop" from Washington to Baltimore, an edge in the network of hops.

(Also, I would call a hop a "leg", but that's just naming choice.)

By having the hops tie two stations together, it becomes possible link up a series of hops that gets you from one place to another in a single trip. Some trips could even involve changing trains a some station, provided the arrival time is a little before the departure time for the next hop.

The down side to this is that there's a little more redundancy in the station codes. I haven't figured out whether this redundancy is harmful or not.

Walter Mitty
The "edge" approach sounds good but it would mean a complete re-design of Eric's proposed query. With regards to the redundancy in the station codes, I don't think there's any other approach and since most codes are 3-4 characters long it won't make much of an impact on the size dimension.
muteW