views:

50

answers:

1

I'm working on a web app for a quality control checklist. I already have a table set up, but I have a hunch that our model is sub-optimal and I could get some better performance. Please not that I'm using mysql, so I'm limited to its capabilities.

Each checklist has dozens, sometimes hundreds of questions. Each question has between 2 and 10 possible answers. Each Question is a varchar string, and so is each answer. A completed checklist is when all of the questions are associated with one of its possible answers -- when one answer is chosen.

Checklists are different for different purposes, and they can change over time. So to keep completed checklists from inadvertently changing when we want to have changes in new checklists, we have templates. Templates, Questions, and Answers are a mirror of Checklists, Questions, and Answers, and represent the 'current version' of the checklist.

So the table hierarchy looks like this

. Client

  • Templates
    • TemplateQuestions
      • TemplateQuestionAnswers
  • Checklists
    • ChecklistQuestions
      • ChecklistQuestionAnswers

Because we don't want changes in the current template to 'go back in time' and change completed checklists, data is copied from Templates into Checklists when a user goes to start a new checklist.

As you can guess, this creates a lot of duplication. In ChecklistQuestionAnswers, out of about a million answer rows, there are only 4,000 distinct answers. Of course, TemplatesQuestionAnswers has duplication too, but not as bad.

So what I'm think I want to do is create a versioning system for checklist templates, so I can save on space by storing unique questions with unique sets of answers only once. That way, instead of duplicating text wholesale, I can just link a Checklist against a version of a Template, and then a checklist set is which answer was chosen for which question.

Here's what I've sketched out so far.

A clients has many templates. A template has many revisions, but only one current revision. Each revision has many questions, and each question has many ( between 2 and 10 ) answers. Each Checklist relates to one Template. Each checklist has a set answers that indicate the answer select for each question in its version of the template.

Questions /* all unique question wordings */
Questions.id
Questions.question

Answers /* all unique answer wordings. */
Answers.id
Answers.answer 

Templates 
Templates.client_id /* relates to client table. */
Templates.template_name 
Templates.current_version /* this is related to TemplateVersions.version_number */

TemplateVersions /* A logical grouping of a set of questions and answers */
TemplateVersions.version
TemplateVersions.template_id /* relates this version to a template. */

TemplateQuestions
TemplateQuestions.template_version /* relates a question to a template version */
TemplateQuestions.question_id /* relates a unique question to this template version */
TemplateQuestions.id

TemplateQuestionAnswers
TemplateQuestionAnswers.template_question_id /* relates this answer to a particular template version question */
TemplateQuestionAnswers.answer_id /* relates the unique question to a unique answer */
TemplateQuestionAnswers.id

Checklists
Checklists.id
Checklists.template_version /* relates this question to a template version -- associating this checklist to a client happens through this relationship */

ChecklistAnswers /* ( I might call this something other than 'Answers' since the lack of ChecklistQuestionAnswers breaks 'name symmetry' with TemplateQuestionAnswers ) */
ChecklistAnswers.checklist_id 
ChecklistAnswers.question_id
ChecklistAnswers.answer_id

The rub I'm getting hung up on is guaranteeing that ChecklistAnswers associates a proper question-and-answer pair -- the relationship that exists in the version of the Template that it's Checklist parent is referencing.

In other words, each row in ChecklistAnswers must 'mirror' a question_id from TemplateQuestions to one child question from TemplateQuestionAnswers, form the template_version in Checklists. I'm trying to think of how to do this and my thinking process short circuits here. This is really the 'deliverable' of the database -- a completed checklist -- so all the other templates and everything is sort of epiphenomenal or an abstraction of that. If I can't get this working, I've missed the whole point!

This seems a little unwieldy, so I'm wondering if I'm making a solution whose complexity is not worth the space-savings I might get from implementing it.

Also note, I've simplified this a bit. There are other dimensions of complexity, such as a category system for grouping questions for reporting, but I don't think we need to get into that here.

A: 

As far as I understand:

A simple improvement in what you are doing might be to use 3 tables for templates and only 2 tables for actual checklists: Checklist ( foreign key to a version of a template used ) Answer ( foreign key to checklist, foreign key to templateAnswer )

So if you will want to retrieve a list of answers for a particlar checklist you would :

select  <whatever columns you like>
from checklist c, answer a, templateAnswer ta, templateQuestion tq
where  a.checklist_id = c.id AND a.ta_id = ta.id AND ta.tq_id = tq.id AND
c.id = <something>

ps. If questions share answers, and they probably do in many cases ("yes", "no" comes to mind), you can have a table for unique answers: templateAnswers and a table templateAnswerUsage ( foreign key to template answer and foreign key to templateQuestion). This way you have no duplication of answer text. It's essentially many to many relationship between questions and answers. This might or might not make sense depending if the answers average size is bigger than the size of IDs you will use.

Marcin K
In my sketch, I have only two tables for checklists - Checklists and ChecklistAnswers -- did I not make this clear? ( I know this is confusing :) I don't think the queries will be the problem, but getting values with meaningful and correct relationships *into* the ChecklistAnswers table is what I can't wrap my head around. And questions do share answers -- thus the Answers table in the proposal.
Sorry, I missed some of your sketch solution.I think your problem is that in ChecklistAnswers you've got a foreign key to the answer table (or is it not true?), where you should have a foreign key to TemplateQuestionAnswers table. TemplateQuestionAnswers already binds possible answers with questions, so if you reference it, you will keep the binding.
Marcin K
I think you're on to something, but my brain is short-circuiting again... :P
If I had a foreign key to TemplateQuestionAnswers table, it would only guarantee that I had any answer from TemplateQuestionAnswers, not that the answer was constrained to the set of possible answers that are all related to one question TemplateQuestions, *I think*