I'm working on a web app for a quality control checklist. I already have a table set up, but I have a hunch that our model is sub-optimal and I could get some better performance. Please not that I'm using mysql, so I'm limited to its capabilities.
Each checklist has dozens, sometimes hundreds of questions. Each question has between 2 and 10 possible answers. Each Question is a varchar string, and so is each answer. A completed checklist is when all of the questions are associated with one of its possible answers -- when one answer is chosen.
Checklists are different for different purposes, and they can change over time. So to keep completed checklists from inadvertently changing when we want to have changes in new checklists, we have templates. Templates, Questions, and Answers are a mirror of Checklists, Questions, and Answers, and represent the 'current version' of the checklist.
So the table hierarchy looks like this
. Client
- Templates
- TemplateQuestions
- TemplateQuestionAnswers
- TemplateQuestions
- Checklists
- ChecklistQuestions
- ChecklistQuestionAnswers
- ChecklistQuestions
Because we don't want changes in the current template to 'go back in time' and change completed checklists, data is copied from Templates into Checklists when a user goes to start a new checklist.
As you can guess, this creates a lot of duplication. In ChecklistQuestionAnswers, out of about a million answer rows, there are only 4,000 distinct answers. Of course, TemplatesQuestionAnswers has duplication too, but not as bad.
So what I'm think I want to do is create a versioning system for checklist templates, so I can save on space by storing unique questions with unique sets of answers only once. That way, instead of duplicating text wholesale, I can just link a Checklist against a version of a Template, and then a checklist set is which answer was chosen for which question.
Here's what I've sketched out so far.
A clients has many templates. A template has many revisions, but only one current revision. Each revision has many questions, and each question has many ( between 2 and 10 ) answers. Each Checklist relates to one Template. Each checklist has a set answers that indicate the answer select for each question in its version of the template.
Questions /* all unique question wordings */
Questions.id
Questions.question
Answers /* all unique answer wordings. */
Answers.id
Answers.answer
Templates
Templates.client_id /* relates to client table. */
Templates.template_name
Templates.current_version /* this is related to TemplateVersions.version_number */
TemplateVersions /* A logical grouping of a set of questions and answers */
TemplateVersions.version
TemplateVersions.template_id /* relates this version to a template. */
TemplateQuestions
TemplateQuestions.template_version /* relates a question to a template version */
TemplateQuestions.question_id /* relates a unique question to this template version */
TemplateQuestions.id
TemplateQuestionAnswers
TemplateQuestionAnswers.template_question_id /* relates this answer to a particular template version question */
TemplateQuestionAnswers.answer_id /* relates the unique question to a unique answer */
TemplateQuestionAnswers.id
Checklists
Checklists.id
Checklists.template_version /* relates this question to a template version -- associating this checklist to a client happens through this relationship */
ChecklistAnswers /* ( I might call this something other than 'Answers' since the lack of ChecklistQuestionAnswers breaks 'name symmetry' with TemplateQuestionAnswers ) */
ChecklistAnswers.checklist_id
ChecklistAnswers.question_id
ChecklistAnswers.answer_id
The rub I'm getting hung up on is guaranteeing that ChecklistAnswers associates a proper question-and-answer pair -- the relationship that exists in the version of the Template that it's Checklist parent is referencing.
In other words, each row in ChecklistAnswers must 'mirror' a question_id from TemplateQuestions to one child question from TemplateQuestionAnswers, form the template_version in Checklists. I'm trying to think of how to do this and my thinking process short circuits here. This is really the 'deliverable' of the database -- a completed checklist -- so all the other templates and everything is sort of epiphenomenal or an abstraction of that. If I can't get this working, I've missed the whole point!
This seems a little unwieldy, so I'm wondering if I'm making a solution whose complexity is not worth the space-savings I might get from implementing it.
Also note, I've simplified this a bit. There are other dimensions of complexity, such as a category system for grouping questions for reporting, but I don't think we need to get into that here.