I found a similar question asked previously (School attendance database)
I have to deal with these additional conditions.
- Total number of users recording attendance would be 100,000.
- Each user will have swipe-in swipe-out entry.
- A user may do multiple swipe-in swipe-out incase s/he is not sure data was captured.
- A record of 1 year attendance has to be maintained which can be access by the user.
The basic table i thought was with following entries.
- UserID - numeric value
- Date
- Swipe in time
- Swipe out-time.
If this is the table then approx number of rows in database would be = 100,000 x 250(working days in yr) = 25,000,000 in ideal situation. Now if user duplicate either swipe-in or swipe-out rows will add up. Say 1/3 of employee do this to ensure attendance is marked. so additional rows 8,333,333 totalling to 33,333,333 approx.
One of the issues would be when a user swipes-in twice but swipes out only once. Then i need to have null value in the second swipe-in or fill the same value in the swipe-out field. This would add up the additional rows mentioned. The other option i thought was to run a background task every day to clean the double user entry. Say user swipes in at 8.00 A.M and then 8.10 A.M so the system removes the 8.10 A.M entry at the end of the day.First in last out time basis.
However, i prob i forsee is. If say user stays overnight in office working and swipes maybe 2.00 A.M. The swipe data would be
- Swipe in - 1-Jan-10 - 8.00 A.M.
- Swipe out - 2-Jan-10 - 2.00 A.M.
- Swipe in - 2 Jan-10 - 1.00 P.M. (he comes back to office again same day - work pressure :))
- Swipe out - 2 Jan-10 - 10.00 P.M. How to handle this?
My questions are: 1. Is the number of rows listed acceptable to databases like mysql, postgresql without delaying too much of retrival time? I would be interested more in opensource db performance. 2. Is there a better way to format the table than this?