Background
I'm a first year CS student and I work part time for my dad's small business. I don't have any experience in real world application development. I have written scripts in Python, some coursework in C, but nothing like this.
My dad has a small training business and currently all classes are scheduled, recorded and followed up via an external web application. There is an export/"reports" feature but it is very generic and we need specific reports. We don't have access to the actual database to run the queries. I've been asked to set up a custom reporting system.
My idea is to create the generic CSV exports and import (probably with Python) them into a MySQL database hosted in the office every night, from where I can run the specific queries that are needed. I don't have experience in databases but understand the very basics. I've read a little about database creation and normal forms.
We may start having international clients soon, so I want the database to not explode if/when that happens. We also currently have a couple big corporations as clients, with different divisions (e.g. ACME parent company, ACME healthcare division, ACME bodycare division)
The schema I have come up with is the following:
- From the client perspective:
- Clients is the main table
- Clients are linked to the department they work for
- Departments can be scattered around a country: HR in London, Marketing in Swansea, etc.
- Departments are linked to the division of a company
- Divisions are linked to the parent company
- From the classes perspective:
- Sessions is the main table
- A teacher is linked to each session
- A statusid is given to each session. E.g. 0 - Completed, 1 - Cancelled
- Sessions are grouped into "packs" of an arbitrary size
- Each packs is assigned to a client
- Sessions is the main table
I "designed" (more like scribbled) the schema on a piece of paper, trying to keep it normalised to the 3rd form. I then plugged it into MySQL Workbench and it made it all pretty for me:
(Click here for full-sized graphic)
Example queries I'll be running
- Which clients with credit still left are inactive (those without a class scheduled in the future)
- What is the attendance rate per client/department/division (measured by the status id in each session)
- How many classes has a teacher had in a month
- Flag clients who have low attendance rate
- Custom reports for HR departments with attendance rates of people in their division
Question(s)
- Is this overengineered or am I headed the right way?
- Will the need to join multiple tables for most queries result in a big performance hit?
- I have added a 'lastsession' column to clients, as it is probably going to be a common query. Is this a good idea or should I keep the database strictly normalised?
Thanks for your time