views:

443

answers:

11

Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search. Now, the obvious way of storing these values is to create a table like

UserLanguages ( UserID nvarchar(50), LookupLanguageID int )

but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"

Again, I don't search for them so I don't worry about having to do fulltext index on that column.

I don't really see any problems with this solution, but am I overlooking anything?

Thanks, Andrey

+10  A: 

This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "

I would suggest going with a normalized design. Put it in a separate table.

Raj More
Well, as I stated, I don't care about reporting, as we'll maintain all normalized data in the search engine and reports will be querying it, not the db. The list in the db will be for two things only: UI and reindexing of search engine in case index drops.I see you point, as well as others who responded, but I'm still tempted because of performance
Andrey
+10  A: 

Don't.

  • You don't search for them now
  • Data is useless to anything but this one situation
  • No data integrity (eg no FK)
  • You still have to change to "English,German" etc for display
  • "Give me all users who speak x" = FAIL
  • The list is actually a presentation issue

It's your system, though, and I look forward to answering the inevitable "help" questions later...

gbn
Actually, "Give me all users who speak x" can be done: SELECT * FROM table WHERE ',' + languages + ',' LIKE ',' + x + ','. A terrible idea, but possible.
erikkallen
Of course: SELECT * FROM table WHERE ',' + languages + ',' LIKE '%,' + x + ',%'
erikkallen
@erikkallen: now THAT is a performance killer ;)
Chris Lively
+3  A: 

I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.

As alternative solution: You could store as one bitmasked integer, for example: 0 - No selection 1 - English 2 - Spanish 4 - German 8 - French 16 - Russian --and so on powers of 2

So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.

WebMatrix
I've seen Bitwise operations in a database, not a fan.
Gratzy
Isn't storing it as a bitmasked integer also trying to store relational data in a non relational fashion? Not trying to be argumentative, I'm wondering if I'm missing something.
Gratzy
@Gratzy, you could make such argument, it's not your traditional relational data storage technique, though it's not as bad as comma delimited. bitwize works best in certain situations where you need to test for multiple true conditions (and i think it applies to OP scenario), I could be wrong
WebMatrix
@WebMatrix thanks for the additional info.
Gratzy
I've done that too. Always bites.
erikkallen
+10  A: 

You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.

What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.

If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.

More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.

jbourque
Well, the problem that there are about 30 of such multiple choices and all are to be displayed on the same profile page which ultimately means querying 30 tables for one request. It will definitely be a bottleneck, so I'm trying to see what are my options
Andrey
30 queries on a profile page? That should never be a problem with load. How often do you think someone will actually load their profile page? Once a minute? Hardly. Again, don't preoptimize
Chris Lively
+4  A: 

Premature optimization is the root of all evil.

EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.

Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".

While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.

When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).

LBushkin
-1 because this is a very basic rule of database design (1NF)
Raj More
+2  A: 

If you're not querying against them, you don't lose anything by storing them in a form like your initial plan. If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.

CodeByMoonlight
+2  A: 

Problems:

  1. You lose join capability (obviously).
  2. You have to reparse the list on each page load / post back. Which results in more code client side.
  3. You lose all pretenses of trying to keep database integrity. Just imagine if you decide to REMOVE a language later on... What's the sql going to be to fix all of your user profiles?
  4. Assuming your various profile options are stored in a lookup table in the DB, you still have to run "30 queries" per profile page. If they aren't then you have to code deploy for each little change. bad, very bad.
  5. Basing a design decision on something that "won't happen" is an absolute recipe for failure. Sure, the business people said they won't ever do that... Until they think of a reason they absolutely must do it. Today. Which will be promptly after you finish coding this.
  6. As I stated in a comment, 30 queries for a low use page is nothing. Don't sweat it, and definitely don't optimize unless you know for darn sure it's necessary. Guess how many queries SO does for it's profile page?
Chris Lively
A: 

Nooooooooooooooooo!!!!!!!!

As stated very well in the above few posts.

If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.

Jestep
+1  A: 

You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.

The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).

Shane
A: 

Go for it. I've seen this approach used many times on data that does not require any kind of querying/reporting (simple "format and present" data) and it's worked just fine. Anyone who does web development knows what I'm talking about. My guess is that all the people discouraging it are old-school developers still hung on normalizing a database to the last possible bit. Like one other reviewer mentioned, Wordpress seems to work just fine.

WebDeveloper
A: 

If you don't mind, I have a different kind of approach for this. Please checkout by blog for complete idea and source code in the following link: http://www.technicalganesh.com/programming/dot-net/representing-multiple-selections-using-a-single-value-in-database-tables/

If you find it is really useful just reply with your appreciations!

Thanks, Ganesh Kumar

Ganesh Kumar