views:

362

answers:

3

I seem to often find myself wanting to store data of more than one type (usually specifically integers and text) in the same column in a MySQL database. I know this is horrible, but the reason it happens is when I'm storing responses that people have made to questions in a questionnaire. Some questions need an integer response, some need a text response and some might be an item selected from a list.

The approaches I've taken in the past have been:

1) Store everything as text and convert to int (or whatever) when needed later.

2) Have two columns - one for text and one for int. Then you just fill one in per row per response, and leave the other one as null.

3) Have two tables - one for text responses and one for integer responses.

I don't really like any of those, though, and I have a feeling there must be a much better way to deal with this kind of situation.

To make it more concrete, here's an example of the tables I might have:

CREATE TABLE question (
  id int(11) NOT NULL auto_increment,
  text VARCHAR(200) NOT NULL default '',
  PRIMARY KEY ('id')
)

CREATE TABLE response (
  id int(11) NOT NULL auto_increment,
  question int (11) NOT NULL,
  user int (11) NOT NULL,
  response VARCHAR(200) NOT NULL default ''
)

or, if I went with using option 2 above:

CREATE TABLE response (
  id int(11) NOT NULL auto_increment,
  question int (11) NOT NULL,
  user int (11) NOT NULL,
  text_response VARCHAR(200),
  numeric_response int(11)
)

and if I used option 3 there'd be a responseInteger table and a responseText table.

Is any of those the right approach, or am I missing an obvious alternative?

Thanks,

Ben

A: 

Option 2 is the correct, most normalized option.

Ray
Whoever downvoted this, ... Why? Please people, if you are downvoting, give an explanation.
wcm
wcm, who are you? If SO wanted this to be standard, they'd popup a place for the comment when clicking the down arrow.
There's a difference between having a culture where manners are mandatory and where they are voluntary. I wasn't suggesting that SO should force anything.
wcm
As for who I am, I am a person that was wondering if there was something wrong with the answer or if the person downvoting was just having a bad day. You answered this wonderfully and your answer deserved to be selected. Thanks for the follow through.
wcm
It's hard to keep track of the various morality police's rules. Comments aren't required to keep the votes anonymous. I don't presume to require time from people for the sake of politeness. Getting to the best answer should be paramount. If coaxing bad answers to bottom furthers that goal, so be it.
+4  A: 

[Option 2 is] NOT the most normalized option [as @Ray claims]. The most normalized would have no nullable fields and obviously option 2 would require a null on every row.

At this point in your design you have to think about the usage, the queries you'll do, the reports you'll write. Will you want to do math on all of the numeric responses at the same time? i.e. WHERE numeric_response IS NOT NULL? Probably unlikely.

More likely would be, What's the average response WHERE Question = 11. In those cases you can either choose the INT table or the INT column and neither would be easier to do than the other.

If you did do two tables, you'd more than likely be constantly unioning them together for questions like, what % of questions have a response etc.

Can you see how the questions you ask your database to answer start to drive the design?

Mark: you should characterize "It's" more clearly; you are probably responding to @Ray, but your answer should be free-standing (able to be read on its own).
Jonathan Leffler
I added some text to clarify that Mark is referring to Option 2.
Bill Karwin
OK. That makes sense. I need to put some more thought into my design before I proceed with this kind of thing.
Ben
Yes, both of you are correct. It's sometimes hard to get used to the idea that these posts jockey for position.Thanks to both of you!
+2  A: 

I'd opt for Option 1. The answers are always text strings, but sometimes the text string happens to be the representation of an integer. What is less easy is to determine what constraints, if any, should be placed on the answer to a given question. If some answer should only be a sequence of one or more digits, how do you validate that? Most likely, the Questions table should contain information about the possible answers, and that should guide the validation.

I note that the combination of QuestionID and UserID is (or should be) unique (for a given questionnaire). So, you really don't need the auto-increment column in the answer. You should also have a unique constraint (or primary key constraint) on the QuestionID and UserID anyway (regardless of whether you keep the auto-increment column).

Jonathan Leffler
THANK YOU! Avoiding the arbitrary key when he would probably never use this table as the parent of another, more than likely, is good thing. Agree that the answers are really just text which sometimes might fit into an numbers. But it seemed like a good time to show "how" to get to the answer.