Good morning everyone,
Please read everything before answering.
I have sort of an opinion question but at the same time one that may have a right answer. I'm trying to develop a suite of products and want to make sure that since I'm doing it myself, that I do it right the first time. I've re-wrote the schema multiple times, each time thinking it was better. Then I could come across some new idea and it would either require a lot of work on the schema, or it would break my schema.
In college, I learned about "rationalizing" (I think that's the word they used, could be way off) a database and there are 5 levels. From what I remember, level 3 was most common. I know the practice was to make sure that data wasn't repeated and to do that, you had to break up tables into smaller tables. And depending on how far you broke it up, the higher the level was. Well, I don't know if I want the highest level, but I know I want it as efficient as I can get it. I've had 4 years of SQL Server 2000/2005/2008 and 2 years of Oracle, about 6 months with Informix (5+ years ago), a touch here or there with mySQL and about 6 months of Access. My preference is SQL Server, but I would like the schema to be as efficient on either platform.
Here's a psuedo schema layout of some of the tables, then I'll explain what I want to do.
Manufacturers
ManufacturerID (Identity)
ManufacturerName
ManufacturerStreetAddress
ManufacturerZipCodeID
...
ZipCodes
ZipCodeID (Identity)
ZipCode
ZipCodeStateID
...
States
StateID (Identity)
StateName
StateAbbreviation
...
Cities
CityID (Identity)
CityName
CityStateID
...
I apologize for it only being a psuedo schema, but that's all I have right now as I'm doing the design on paper on break, but had a question before I got too far. What I want to do is make sure that everything ties to each other properly. My belief is that a Zip-Code belongs to a State and a City, but no City belongs to any ONE Zip-Code, it could have many. If I put the Zip-Code in the Manufacturers table, I want to be able to get the State and the City. But I don't want to use any IDs too many times in other tables. What I mean by that is having StateID in ZipCodes and Cities may be one too many times. A state can have multiple cities with the same name, and multiple states can have cities with the same names. But I'm not sure if I'd want a CityNames table and then a CityStates table (CityNameID and StateID). I am well aware that there are location databases out there for purchase, maybe some free, that I could use and wouldn't have to worry about this. However, I'd like to work on my understanding of this because I believe it would help me schema design wise in the future, but also because I would like to have the customability of the layout if anything needed to be changed.
Questions:
- Does that psuedo schema, as it is, seem correct or could it be better (opinion)?
- Is it called "rationalizing" the database, or something else (will vote up for right answer)? And how far is too far (opinion)
- There will also be a Users table, and other tables that will include addresses (Teams, Capitols, etc.), so would the psuedo schema, if it is correct in theory, be a good plan for a database like that (opinion)?
Thank you all for your time, I will vote up any answer that is thorough and coherent. Database experts or people with many years of database experience are preferred, but I will listen to all answers. Also, I'm not sure if this should be a community wiki, but I am not marking it as one right now. Thanks.
Update: Also, I forgot to mention that I know with "rationalizing" the database comes the need for joins and sometimes subqueries. I normally abuse LEFT OUTER JOINs, but what would be the most efficient way to tie these tables together to display an address, rather than performing 4 different queries? Thanks.
Update: Ok, now this might be too normalized or not normalized enough or at all, but could you guys tell me if you like this psuedo schema better?
Manufacturers
ManufacturerID (Identity)
ManufacturerName
ManufacturerStreetAddress
ManufacturerCCSZID --CCSZ (Country, City, State, Zip), needs a better name
...
ZipCodes
ZipCodeID (Identity)
ZipCode
...
States
StateID (Identity)
StateName
StateAbbreviation
...
Cities
CityID (Identity)
CityName
...
Countries
CountryID (Identity)
CountryName
CountryAbbreviation
...
CountryCityStateZipCodes
CountryCityStateZipCodeID (Identity)
CCSZCountryID
CCSZStateID
CCSZCityID
CCSZZipCodeID
And to get an address, it would look like:
SELECT M.ManufacturerStreetAddress,
CN.CountryName,
CN.CountryAbbreviation,
S.StateName,
S.StateAbbreviation,
C.CityName,
Z.ZipCode
FROM Manufacturers M
LEFT OUTER JOIN CountryCityStateZipCodes CCSZ ON CCSZ.CountryCityStateZipCodeID = M.ManufacturerCCSZID
LEFT OUTER JOIN Countries CN ON CN.CountryID = CCSZ.CCSZCountryID
LEFT OUTER JOIN States S ON S.StateID = CCSZ.CCSZStateID
LEFT OUTER JOIN Cities C ON C.CityID = CCSZ.CCSZCityID
LEFT OUTER JOIN ZipCodes Z ON Z.ZipCodeID = CCSZ.CCSZZipCodeID
Or maybe you guys know a better way to write that query. But regardless, does that look better than the first schema?