views:

207

answers:

8

I was wondering is this a good design, assuming the tables as follows

ADDRESS(id, address, city_fk, stateFK, countryFK),
CITY(id, name, stateFK, countryFK),
STATE(id, name, countryFK),
COUNTRY(id, name)

Notice how country fk is repeated in 3 tables? and state fk repeated in 2 tables? Can anyone tell me if this is a good design? If so, why? Coz I dont see the need to repeat it every so often.

Cheers

+1  A: 

I'm not sure I would head down that path. I can understand having a country table and certainly a state table, but a city table that ensures that city belongs to the particular country/state in question. I would just imagine the amount of data that would be necessary in your City FK table would be enormous, and I'm not sure I see the benefit. Perhaps if you could detail a little more what benefit you hoped to derive from having a city table I might be better able to answer this. Most systems I have seen have FK tables for Country and State, but those tables are not necessarily related to one another.

JPrescottSanders
+4  A: 

You want something more like this:

  • ADDRESS(id, address, city_fk)
  • CITY(id, name, state_fk)
  • STATE(id, name, country_fk)
  • COUNTRY(id, name)

And if it was me, i'd rename the fields a bit:

  • ADDRESS(id, address, city_id)
  • CITY(id, city, state_id)
  • STATE(id, state, country_id)
  • COUNTRY(id, country)
Chris
A: 

I would do this:

  • Country(countryID, countryName)
  • State(stateID, stateName, countryID(FK))
  • City(cityID, cityName, stateID(FK)) <- you are bringing the country with that FK
  • Address(addressID, adressDescription, cityID) <- Same concept here.

Now tell me, what are the odds to have the same address in different cities, states, countries? Minimal. So this should work. Remember that normalizing the DB to the extreme not always works or is helpfull.

Hope this helps

Best Regards!

MRFerocius
A: 

Depends on how you have defined the PrimaryKeys on all the tables. If you have the id column as a key on all of the tables, you can reduce the colums in all of the tables so that they just keep the FK to the next region.

  • address (GIVEN)
  • city via address.city_fk
  • state via city.state_fk
  • country via state.country_fk country

Design would be like Chris'. If you need all the data in one query you would do a join.

However, if id is not a key but a partial key (can not define the data-set alone) in all of the tables, then I would assume this a good design. An Address would then be identified like this:

  • Country 8 (France)
  • State 3 (Paris-region)
  • City 23 (Paris)
  • Address 1 (first Address stored for that City)

Just knowing "City 23" wouldn't be enough, because then the question emerges: 23rd city in WHICH state, and which state in WHICH country - that would cause you to store the data in all of the tables.

But it really depends on how you have defined your keys.

Marcel J.
A: 

Consider: If I know the state, and the state FK's the country, I transitively know the country and it's a redundant denormalization to include it again.

If I know the city, and the city FK's the state, I transitively know the state and it's a redundant denormalization to include it again.

So if I know the city, I know the state, and I know the country.

ADDRESS(id, address, city_fk), CITY(id, name, stateFK), STATE(id, name, countryFK), COUNTRY(id, name)

In practice, there are legitimate arguments for making city an entity or and attribute. For something like a mailing address, it's probably simpler to make it an attribute.

For something like a "voter file" (a database used for electioneering and political campaigns) it may sense to make a city an entity. It's especially important in the US State of Virgina, which is unique in the US in that incorporated cities are independent political jurisdictions, unlike the rest of the US, where jurisdictionally, most cities are part of counties.

Even in states other than Virgina, because cities that are part of counties have separate elections (for mayors and councils), in a political database, it's useful to make cities (and getting smaller, wards and precincts and townships and villages, and getting larger, counties and HDs and SDs and CDs) into entities.

Now take for example, Lakewood, in the US State of Ohio. Jurisdictionally, Lakewood is Cuyahoga county. In terms of political TV ads, it is part of the Cleveland-Akron-Canton television Designated Market Area. It lies just outside of the city of Cleveland, which is also jurisdictionally part of Cuyahoga County. Lakewood is divided into four councilmanic Wards, each of which is divided into fifteen or more precincts. It is part of state House District 13, which includes Lakewood and part of the western side of the city of Cleveland. It is part of State Senate District 23, which includes Brooklyn, Brook Park, Cleveland (Wards 7-21 and part of Ward 14), Lakewood, Linndale, Middleburg Heights, Parma and Parma Heights. Lakewood is in 10th US Congresional District of Ohio (Represented by Dennis Kucinich).

Now, for any given Ohio voter, how do we represent in the database who will be on his ballot when he votes?

tpdi
+1  A: 

I guess my question is "Good design for what?" If the precise integrity of an address is absolutely crucial to your design, then this might be the start of a fruitful discussion. On the other hand, if the purpose is to collect addresses and be able to store them in all the variety you might expect to get from real users, you might consider something with a little more flexibility: i.e. fewer tables with more optional fields and a good set of friendly validation rules in the UI.

le dorfier
A: 

What I always do with table design is look at it and work out what sort of trouble I can get into (i.e., what inconsistencies can be generated).

In your particular case, the obvious one is that you could create an address that's in the state of Wyoming and in the city of Dallas, which is in the state of decay, oops, I mean Texas :-)

Similarly you could have an address that is in the country of Australia and the state of Texas (in the US) and the city of Xolotllotla in Peru in South America. I wouldn't want to have to explain your schema to the postman when he tracks you down.

How do you resolve that? You should probably only reference the city from the address, then the state from the city, then the country from the state.

But addresses are a tricky beast and the form they take depends a great deal on where they are located. Some have counties, some cities cross state boundaries and so on.

Unless you have a dire need for finding people in specific cities, I'd just incorporate that into a free-form address (most places have postcodes of some sort which is a better choice for that). You may well want to select by state or country, in which case leave those as separate fields.

My first attempt would have been:

ADDRESS(id, address_including_city, stateFK),
STATE(id, name, countryFK),
COUNTRY(id, name)

You would have to distinguish between 'identical' states in different countries (e.g., Washington and Western Australia both being WA) but you could easily show the country in data entry screens so that the operator knows which state they're entering (or more likely, they'll choose the country first and the choices of state will be whittled down).


a Xolotllotl is not a real city as far as I know, it's just one I made up that sounds like those beautiful Incan names of old.

paxdiablo
A: 

Are you perhaps normalizing too far?

While you are reducing the cost of data being repeated you are possibly negating it by requiring 3 or 4 joins to get a whole address...

Personally I would stick to:

ADDRESS:

  • ID
  • Street Address (e.g. 123 Joe Street)
  • City
  • State
  • Postcode (ZIP)
  • Country (e.g. US or AU)

Of course you would have to index on things such as city/state/postcode/country, but it would possibly be a lot quicker then joining on multiple tables.

(Note: I'm not a DBA, so I can't back up my claims, but this is what I use and it works fine)

Matthew Savage