views:

231

answers:

9

Imagine a web form with a set of checkboxes (any or all can be selected). I chose to save them in a comma separated list of values stored in one column of the database table.

Now, I know that the correct solution would be to create a second table and properly normalize the database. One reason was laziness, my SQL knowledge is very limited and only seldomly used, so I would have had to look up quite some stuff to implement the more correct solution.

I thought the saved time and simpler code was worth it in my situation, it this a defensible design choice, or should I have normalized it from the start?

Edit:

Some more context, this is a small internal application that essentially replaces an Excel file that was stored on a shared folder. I'm also asking because I'm thinking about cleaning up the programm and make it more maintainable. There are some things in there I'm not entirely happy with, one of them is the topic of this question.

+2  A: 

Man up and do it right the first time, and you won't have to redo it later.

Paul Tomblin
+2  A: 

Yes, I would say that it really is that bad. It's a defensible choice, but that doesn't make it correct or good.

It breaks first normal form.

A second criticism is that putting raw input results directly into a database, without any validation or binding at all, leaves you open to SQL injection attacks.

What you're calling laziness and lack of SQL knowledge is the stuff that neophytes are made of. I'd recommend taking the time to do it properly and view it as an opportunity to learn.

Or leave it as it is and learn the painful lesson of a SQL injection attack.

duffymo
I don't see anything in this question that suggests he is vulnerable to SQL injection. SQL injection and database normalisation are orthogonal topics, and your digression on injection is irrelevant to the question.
Hammerite
The input is escaped, and anybody who has access to this application already has easier ways of wreaking havoc. I'm using Drupal db_query to access the database, seperately supplying the parameters.
Fabian
@Hammerite, even if this particular bit of laziness and unwillingness to learn doesn't lead to an SQL injection, other examples of the same attitude will.
Paul Tomblin
@Hammerite, there's nothing to exclude the possibility, either. I thought it was worth bringing up in case the OP's ignorance extended to SQL injection as well. I agree that normalization and SQL injection can be orthogonal, but without other information it seemed to me that it should be mentioned. It's hardly irrelevant.
duffymo
@Paul: And maybe the same attitude will lead to him being hit by a bus when he fails to look both ways before crossing the street, but you haven't warned him about that. Edit: I had thought you were the poster of this answer, my mistake.
Hammerite
@Hammerite - your extrapolation to buses is ridiculous.
duffymo
Yes, it was intended to be ridiculous. Its ridiculousness illustrates the point I'm making, which is that it makes no sense to warn him against something you have no reason to think he needs to be warned about.
Hammerite
Yes, I see. I think I had far more reason that your warning about buses.
duffymo
A: 

Well I've been using a key/value pair tab separated list in a NTEXT column in SQL Server for more than 4 years now and it works. You do lose the flexibility of making queries but on the other hand, if you have a library that persists/derpersists the key value pair then it's not a that bad idea.

Raj
No, it's a horrible idea. You've managed to get away with it, but the cost of your few minutes of development time has cost you lousy query performance, flexibility, and maintainability of your code.
Paul Tomblin
Paul, I agree. But as I said I used if for a specific purpose, and that is for a data entry operation where you have many kinds of forms. I am revising the design now that I have learnt NHibernate but back then I needed the flexibity to design the form in ASP.NET and use the textbox ids as key in the key/value pair.
Raj
+4  A: 

There are numerous questions on SO asking:

  • how to get a count of specific values from the comma separated list
  • how to get records that have only the same 2/3/etc specific value from that comma separated list

Another problem with the comma separated list is ensuring the values are consistent - storing text means the possibility of typos...

These are all symptoms of denormalized data, and highlight why you should always model for normalized data. Denormalization can be a query optimization, to be applied when the need actually presents itself.

OMG Ponies
A: 

I would probably take the middle ground: make each field in the CSV into a separate column in the database, but not worry much about normalization (at least for now). At some point, normalization might become interesting, but with all the data shoved into a single column you're gaining virtually no benefit from using a database at all. You need to separate the data into logical fields/columns/whatever you want to call them before you can manipulate it meaningfully at all.

Jerry Coffin
The form contains some more fields, this is only one part of the form (which I did not explain well in the question).
Fabian
+1  A: 

In general anything can be defensible if it meets the requirements of your project. This doesn't mean that people will agree with or want to defend your decision...

In general, storing data in this way is suboptimal (e.g. harder to do efficient queries) and may cause maintenance issues if you modify the items in your form. Perhaps you could have found a middle ground and used an integer representing a set of bit flags instead?

bobbymcr
+1 for the suggestion of alternative approaches.
Hammerite
+6  A: 

"One reason was laziness".

This rings alarm bells. The only reason you should do something like this is that you know how to do it "the right way" but you have come to the conclusion that there is a tangible reason not to do it that way.

Having said this: the data you are choosing to store this way is data that you will never need to query by, then there may be a case for storing it in the way you have chosen.

(Some users would dispute the statement in my previous paragraph, saying that "you can never know what requirements will be added in the future". These users are either misguided or stating a religious conviction. Sometimes it is advantageous to work to the requirements you have before you.)

Hammerite
+21  A: 

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:

  • Can't ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
  • Can't use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
  • Can't enforce uniqueness: no way to prevent 1,2,3,3,3,5
  • Can't delete a value from the list without fetching the whole list.
  • Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan.
  • Hard to count elements in the list, or do other aggregate queries.
  • Hard to join the values to the lookup table they reference.
  • Hard to fetch the list in sorted order.

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational "optimization" benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

Bill Karwin
+1 Love the cover
NullUserException
Thanks for the detailed list, I thought of some of those problems but certainly not of all of them.
Fabian
Book looks interesting so I bought it. I've long thought it was subject matter that needed a book.
HLGEM
A: 

Yes, it is that bad. My view is that if you don't like using relational databases then look for an alternative that suits you better, there are lots of interesting "NOSQL" projects out there with some really advanced features.

Robin