ansaurus

Question

Need assistance with Database Schema (platform independant)

Answer 1

+1 A:

I've always heard it called "normalization," but we're talking about the same thing.

The easiest thing may be to combine city, state, and zip into one table. You can even consider using the zip code itself as the key, although I can think of two reasons why you'd want to avoid that:

Northeastern states have zip codes that begin with 0, which will be truncated if you make zip code a numerical field.
If you use zip code as a key, you cannot have that zip in multiple times for multiple towns. Like you said, the post office cares more about the zip than the town name. But this setup would restrict you from searching on those individual towns later.

To search by city, state, or zip later on, just JOIN this table to the Manufacturers table. You're OK using an INNER JOIN - unless there are fields in the Manufacturers table where ManufacturerZipCodeID is blank, in which case you'll want a LEFT JOIN to get those to show as well.

JohnK813 2010-08-31 14:20:46

Thank you for your answer. I knew it ended in "alization", lol. Yeah, I didn't even think about the 0's in the zip-codes, even though I could easily use the code or view to lead with 0's. Looking through the database manually would be confusing though. I will be manually putting everything in, so no Manufacturer should have a null ZipCode, but that's good knowledge to know (about the Joins). I'll have to contact the post office about the zips crossing state boundaries though, because I didn't think about that either. Thanks.

XstreamINsanity 2010-08-31 14:38:12

Answer 2

+1 A:

I don't have much of a problem with the way you have things setup. A state ID in zip code might be dangerous - it wouldn't surprise me to learn that there are zip codes which cross state boundaries, but I'm not sure about that.

You're going to do a lot of joins by storing state, city and zip code in separate tables, but having dealt with databases that stored the addresses without consistency measures, that's much more of a nightmare than a few joins. For example you end up with "NY" and "ny" and "Ny" and "New York" and "NewYork". So I think have the separate table for state, city and zips will pay off in the long run.

Wade Williams 2010-08-31 14:30:58

Yeah, that's what I was thinking of. I was also thinking while I was waiting for answers, but have now decided against it, was to allow the users to enter in their City, State, Zip and such if it isn't in the database. That way it would build on it's own and I wouldn't have to manually do it all. I could give myself editing capabilities, but at the same time, it might take just as long as me putting it in myself. And I believe there may be a few zip codes that cross state borders. Thanks for the answer though.

XstreamINsanity 2010-08-31 14:42:07

Answer 3

+1 A:

I am not a database expert but, in my perspective, the given pseudo schema seems to be incorrect. Here's the explanation. Facts known from problems are :

A state can have multiple cities.
A state is unique
A cities can have multiple Zip Codes
City name may be equals to another city name.
A Zip Code is unique

First, write down the uniques. So we construct these two raw tables :

STATE
---
State ID (PK)
State Name

ZIP
---
Zip ID (PK)
Zip Code (NK)

Then, a logical question arises. Knowing a Zip ID, how would we retrieve City ID? To answer it, we need to provide a link between Zip and City. Where should this link be put on? It's not in City table since from Fact#3 we know that a city can have many different Zip code. So it must be in ZIP table. This is our next version of ZIP table :

ZIP
---
Zip ID (PK)
Zip Code (NK)
City ID (FK)

Now, since we can "move" from Zip to City, we will discuss about City table. A City name can have same name with others. So we don't need to force it (City Name field) to be unique. So this is our first version of City table :

CITY
----
City ID (PK)
City Name

Again, same logical question arises. How do we move to State knowing a City? A link must be created somewhere between these two tables. Again, knowing the fact#4 cannot guarantee anything about uniqueness of city name. The link must be put on City table. So this is our next version of City table :

CITY
---
City ID (PK)
City Name
State ID (FK)

With this link, we can retrieve State correctly. Overall, we can move from Zip to City through City ID (provided in Zip table) and we can continue to move from City to State through State ID (provided in City table).

Rationalizing a database is good from Database perspective but can be considered "evil" in Programming perspective. Because it pushes programmer to write more and more classes. After all, "too far" can be defined as "the table becomes irrational". City Name table seems irrational since it's an attribute, not an entity. I will happily label "too far" if my Database Analyst create such an irrational table :) On the other hand, over-rationalizing database can greatly impact the database performance. From my experience, it will makes a query runs slower.

Concerning another problems like Users, Teams, Capitols, etc. I cannot say anything for now since i haven't seen the problem yet.

jancrot 2010-08-31 14:41:31

+1 for thoroughness. There wouldn't be much difference between the other tables. They would just have addresses as well, but I'd want to tie their addresses to the afore mentioned tables just like with Manufacturers. However, I am trying to find an example of if a zip code is shared by two states. I know my wife has family in Toney, AL and Toney is right on the other side of the border in TN, so they share the city, I'm not sure about the Zip. If they do, then there would need to be two entries for that zip code, one for each city ID.

XstreamINsanity 2010-08-31 14:59:09

A question I would have for everyone is if a query runs 15 milliseconds slower, or even as much as two seconds slower, is that too much of a cost? Or is that moreso a question for the user? And I guess you guys would need to know the size of the database as well.

XstreamINsanity 2010-08-31 15:02:15

The general rule when designing databases is to normalize as much as you can. Then if you see areas that will be performance problems, de-normalize those areas for performance reasons. However, you shouldn't make guesses about performance and denormalize because you think something will be a problem. You should test your assumptions and de-normalize when testing proves performance would be enhanced by doing so.

Wade Williams 2010-08-31 16:53:06

Thanks, that's what I was thinking as well. I really appreciate all the help guys.

XstreamINsanity 2010-08-31 16:57:33

ansaurus

tags:

views:

answers:

Need assistance with Database Schema (platform independant)

related questions