views:

173

answers:

5

I'm implementing a small time tracking app and I can't decide between two database designs.

The first has all the log entries in a single table, ie.

TimeLog
-------------
job_id   :  long
timestamp
start    :  boolean [or perhaps 'type : enum']
_id      :  long

The other option is to have two tables, one for 'start' entries the other for 'stop' entries.

StartLog                     StopLog
-------------                -------------
job_id   :  long             job_id   :  long
timestamp                    timestamp
id      :  long              id      :  long

What are the pros and cons of each approach? Is there another alternative?

My specific app will be running on SQLite and I'd like to optimise it for speed - but I am interested in generic arguments.

+7  A: 

I would go for the first (single-table, type-differentiated) design, not particularly because there are too many tables, but simply because the database entities are very similar and would have similar constraints and operations.

Cade Roux
+3  A: 

I think that depends on what queris will you use more. If you query most of the time only for start entries or stop entries, but not both, you will benefit from smaller table sizes. On the other hand, if you mostly use both parts in the same query then merging the results could slow you down.

Also, are you sure you want to optimze for speed? Most of the time, speed is not a problem and you should worry what approach makes more sense.

svick
+1 for advising against micro-optimizing (which this most-likely is)
colithium
+3  A: 

The first option is the best by far. Both operations (Start/Stop) share common attributes. With this design you will be in a better position if the requirements change (for example if a "Cancel" operation gets created).

tekBlues
+5  A: 

I would use the first design: a single database table. As Cade Roux said, the entities are very similar and would have similar constraints and a common set of operations. Additionally, I suggest renaming the start field to one called type with data type enum.

Suppose at some point you would like to record other times, not just start and stop times. The most obvious example involves employers recording employees' breaks. Some employers pay their employees for short breaks, perhaps 15-minutes or less in duration. You don't want them to punch out completely (i.e., go "off the clock"), but you do want to indicate the fact that they have started their break period, for payroll calculations and legal purposes (e.g., mandatory breaks every n hours, as required by law).

Table "TimeLog"
---------------
job_id     :  LONG
timestamp  :  TIMESTAMP
type       :  ENUM [ "punch-in", "punch-out", "break-start", "break-stop" ]
_id        :  LONG

This could be easily accomplished with a type field with data type enum. You could have a punch-in, punch-out, break-start, and break-stop, in the case of the above example.

William Brendel
That is exactly what we did. We used that in our employee directory to show who was in and who was out. Breaks counted as out but did not affect compensation. Obviously, the compensation issue does not apply to salaried employees, but everyone used the system, because many systems relied on knowing who was on duty for emailing, notifications, etc.
Cade Roux
A: 

At the level of LOGICAL design, there is no question that the "two-tables" option is the correct one.

At the level of PHYSICAL design, you might benefit frmo the one-table design if start/stop times are very often required together.

You might wonder why to make this distinction. Keeping a clear view of logical design helps you keep a better overview of what kind of data is actually recorded in the database. Having a boolean attribute in one single table does not "obviously" tell you that it's about start versus stop times. Having two distinct "tables" (at the logical level) might better do that job, especially if they have the word "start" resp. "stop" in their names.

The physical design level is where you must focus on performance, the best way to achieve which is usually to get data that is used together frequently, as physically close together as possible (and there rarely is anything closer than together in the same row).