Database denormalization opportunity

views:

513

answers:

+2 Q:

Database denormalization opportunity

I'm looking for a strategy in as far as stopping the repetitive problem of branching out tables. For example as a fictitious use case, say I have a table with users that contains their name, login, password and other meta data. In this particular scenario, say the user is restricted to login per a specific subset of IP(s). Thus, we have a 1:M relationship. Everytime a use case such as the following comes up, your normal work flow includes that of have a 'users' table and a table such as 'user_ips' in which case you'd have something such as a pk(ip_id), fk(user_id) and IP on the user_ips side.

For similar situations, do you folks normally fan out in the fashion as above? Is there an opportunity to denormalize effectively here? Perhaps store the IPs in a BLOB column in some CSV delimited fashion? What are some strategies you folks are deploying today?

+1 A:

One option would be to store your Ip addresses as an xml string. I think this would be better than a comma separted list and allow you flexibility to add other elements to the string should you need them (Port comes to mind) without database changes.

Although, I think the normalized fashion is better in most cases.

JasonS 2008-10-17 22:23:48

Nice idea - you could even store it as JSON and reduce the storage space requirements.

David Robbins 2008-10-17 23:17:25

Oh please no. We're going through the hell of pulling our data back out of XML strings and into columns. Suppose some day you need to select all users with a given attribute. Have fun! Same goes for updating the same value across many users - you have to reconstruct every XML chunk.

Just Some Guy 2008-10-17 23:40:30

@Just Some Guy - yikes, didn't think of it that way!

David Robbins 2008-10-17 23:55:57

+5 A:

I would suspect that any normalized solution when the number of potential related items is large is going to out perform a denormalized solution if properly indexed. My strategy is to normalize the database then provide views or table-based functions that take advantage of indexed joins to make the cost bearable. I'd let performance demands dictate the move to a denormalized form.

Keep this in mind. If you need to implement role-based security access to parts of the information, table-based security is MUCH easier to implement than column-based, especially at the database or data layer level.

tvanfosson 2008-10-17 22:37:15

+4 A:

I would strongly suggest against putting multiple IP addresses in a field. Never mind 3NF this breaks 1NF.

Tvanfsson is right in that if you index the FKEY you'll get pretty comparable performance unless there's going to be millions of records in the 'users_ips' table.

What's even better is that by keeping these tables normalized you can actually report on this information in the future so that when users are confused as to why they can't login from certain LANs, writing the app (or SQL) to troubleshoot and do user IP lookups will be A LOT easier.

Tyler 2008-10-17 22:52:32

+1 A:

As with any denormalization question, you need to consider the costs associated with it. In particular, if you list the IP addresses in the main table, how are you going to be able to answer the question "which users can be associated with IP address w.x.y.z?". With the fully normalized form, that is easy and symmetric with "which IP addresses can be associated with user pqr?". With denormalized forms, the questions have very different answers. Also, ensuring that the correct integrity rules are applied is much harder in the denormalized version, in general.

Jonathan Leffler 2008-10-17 22:59:53

+12 A:

Opportunity to denormalize? I think you may have misunderstood conventional wisdom - denormalization is an optimization technique. Not something you go out looking for.

Mark Brackett 2008-10-17 23:00:30

I totally agree. Imagine creating an update query to remove an IP. Or find out which users use the same IP. I'm getting a headache already.

Bob Fanger 2008-10-19 10:45:30

+1 - clearly needs at least a couple years' experience with query optimization before being allowed to use the word "denormalize". Running with scissors.

le dorfier 2009-03-01 00:22:04

You may want to consider an user-attribute table and attribute-type table where you can define what type of attributes a user can have. Each new use use case would become an attribute type, and the data would simply be added user-attribute table.

With your example of the IP addresses, you would have an attribute type of IP and store the respective IP's in the user-attribute table. This gives you the flexibility to add another type, such as MAC address, and does not require that you create a new table to support the new data types. For each new use case you do not have to add anything bu data.

The down side is that your queries will be a little more complex given this attribute structure.

David Robbins 2008-10-17 23:16:42

IMHO, it's all about cost/benefit analysis. All depends on requirements (including probable ones) and capabilities of the platform you are using.

For example, if you have a requirement like "Show all unique IP addresses recorded in the system", then you better "branch" now and create a separate table to store IP addresses. Or if you need certain constraints on IP addresses (like "all IP addresses of a given user must be unique) then you might greatly benefit from having a separate table and proper constraints applied to it. (Please note that you could meet both requirements even if you used de-normalized design and proper XML-related machinery; however, RelDB-based solution to these requirements seems to be much cheaper to implement and maintain.)

Obviously, these are not be the only examples of requirements that would dictate normalized solution.

At the same time, I think that requirements like "Show all IP addresses of a user" or "Show all users associated with a given IP address" may not be sufficient to justify a normalized solution.

You could try to perform deeper analysis (in search of requirements of the first type), or just rely on your understanding of the project's context (current and future) and on your "guts feeling".

My own "guts feeling" in this particular case is that requirements of the first type (pro-normalization requirements) are extremely likely, so you'd be better off with a normalized solution, from the very beginning. However, you've said that this use case is fictitious, so in your real situation the conclusion may be exactly opposite.

Never say "never": 3NF is not always the best answer.

Yarik 2008-11-19 00:22:06

You're quite right, 3NF isn't always the answer. BCNF is stronger, and sometimes there's nothing for it but to go for broke and use PJNF.

Peter Wone 2008-11-19 00:29:02

And, what's more, when you're in BCNF, very often you are also in 5NF or PJNF.

Jonathan Leffler 2008-11-19 01:02:19

ansaurus

tags:

views:

answers:

Database denormalization opportunity

related questions