ansaurus

Question

How to best design address locations in any SQL Database?

Answer 1

A:

This is a nice place to start. A whole #$(#$-load of database schemas to check out:

http://www.databaseanswers.org/data_models/

TAG 2008-11-18 04:36:52

Nope - nothing helped me there. Also, i'm after some reasons why suggested designs are suggested. not just answers. We need to learn, here.

Pure.Krome 2008-11-18 04:43:10

Eh, this a good list nevertheless.

BobbyShaftoe 2009-01-23 13:53:25

Answer 2

+1 A:

This a problem I've had to deal with and RDBMS systems aren't the best at storing hierarchical data. You might want to look at using an object database since these have to deal with nested objects and are optimized for the problem.

If you need to use an RDBMS, you may have to stick with a de-normalized schema though. Having separate tables to maintain your cities, streets etc may be handy for tracking changes though. If a city or street needs to be renamed, you can update the master record in the respective table and schedule a job to update a text copy of the string in your 'main' table. This will prevent you from having to run updates on 10's 100's of thousands of rows during prime time, but still lets you store the most up-to-data data in the db. Of course, this makes the data duplication situation even worse, but it's the price to pay for performance.

Dana the Sane 2008-11-18 04:41:48

I need to stick to an RDMBS. Why would you suggest sticking to a de-normailized schema over a single table?

Pure.Krome 2008-11-18 04:45:56

In the app I worked on, we only went down the zip level and the volume of data got a bit large. By de-normalize, I mean a schema with fewer tables, possibly with some data repeated in each. This will limit the number of joins you have to make, but let you stick with some db design principles.

Dana the Sane 2008-11-18 04:58:42

I updated my answer to be more specific.

Dana the Sane 2008-11-18 05:11:40

+1 I Agree Normalization is ideal, but in the real-world Denormalized data makes life easier at times. However in Pure.Krome's case the data model seems simple enough, and the data limited enough (I guess this is at most a few million entries?) that it might as well be normalized.

Robert Gould 2008-11-18 05:38:46

Currently we have 1 1/2 mill, but it could get up to around 10 mill odd?

Pure.Krome 2008-11-18 05:44:49

also - i would go either normalized (eg. multi-tables with nullable FK joins) or de-normalized (one table only, with constraints). Discuss!

Pure.Krome 2008-11-18 05:51:02

At the end of the day, it might be best to try both. If you start with option 1, switching to 2 is a matter of running a 'SELECT INTO' type query to create the de-normalized table and updating your CRUD sql statements. Let stress testing decide which works best.

Dana the Sane 2008-11-18 06:30:39

Answer 3

A:

The normalized version sounds more correct to me.

What worries you about the queries?

Can't you just create a view that does the joins for you once and then query that?

Andrew Kennan 2008-11-18 05:21:12

Correct. i've got a number of outer joins when the data is normalized. Previously in other projects very very similar to this, it's been a pain in the back to get performant queries .. especially when i want to 'bubble up' the heirachy.

Pure.Krome 2008-11-18 05:48:48

Lastly, i've been taught to avoid views because i've been told they can be REALLY un-performant, especially on multi-tables which go upto the millions and millions of rows....

Pure.Krome 2008-11-18 05:49:24

If you're using MS SQL you have the option of indexed views which may help: http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx

Andrew Kennan 2008-11-18 06:04:15

Answer 4

A:

Is this an OLTP system and reporting system or only a reporting system? If it's only a reporting system, you can denormalize the data in a data warehouse fashion (with snowflake dimensions or not for the hierachies of geographic jurisdictions) and you'll find the reporting to be easier.

I would start from the results and work back, because it sounds to me like you are getting fed the data and you are trying to bring it into a database to support the reporting and mapping. In this case, the database schema being a traditional normalized system is not important because redundancy in the data is not something that will cause maintenance problems for users, etc.

If this seems appropriate, you want to look into the Kimball books.

Cade Roux 2008-11-18 05:34:06

Answer 5

+1 A:

As @Oddthinking noted in a comment, your problems started at:

So I changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.

So .. that all works fine. except when I try and do some SQL statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries.

This reminds me of the "Doctor, doctor, it hurts when I hit myself like this" joke.

Why exactly did you make the foreign key fields nullable? They were mandatory before, so you should keep them as mandatory, precisely to avoid the nightmares of outer join queries.

Your explanation (question) is somewhat confusing in that you list three fields (Neighborhood, City and State) and then say "those two fields are now nullable". Which two are? And why? And what is in the lookup table? Or is there more than one lookup table? There might be an argument for some sort of NeighbourhoodID number which is a foreign key to a Neighbourhood table, which defines the City and State as well as Neighbourhood name. You might then decide that there is a closed list of cities and the cities have an ID number too, and that number determines the state too. You are probably as well off using a two-letter state code as creating a (probably 4-byte) state ID number. However, do not forget that a check criterion that ensures that the state code is one of the 50 or so valid state codes is harder to write than a foreign key that references a table of states. Since neither states nor cities changes very often, I'd probably use the table of states with a foreign key - but the key column would be the state code.

That means you might have a table of Neighbourhoods with columns NeighbourhoodID, Name, CityID; a table of Cities with columns CityID, Name, State; and a table of States with columns State and Name. You can add other columns as you see fit. And your primary table would contain a NeighbourhoodID column that is a foreign key to Neighbourhoods table.

Jonathan Leffler 2008-11-18 06:33:48

sort of ... Neighborhoods, Cities and States are their own lookup tables => ID and Description. Simple. In the main table, it references these lookup tables. BUT, some of these references can be null - eg. Neighborhood, State == no street text and no city id - hence the nullable.

Pure.Krome 2008-11-18 10:07:58

I still don't get how, in the non-normalized structure, the fields were all not nullable, and yet suddenly become nullable after normalization. What did you have in the not nullable fields when there was no city?

Jonathan Leffler 2008-11-19 01:11:59

Answer 6

A:

Taking the example:

Street, City, State
City, State
Neighborhood, State
State

Firstly go back to basic principles, all of the above are distinct geospatial entities, so your address is composed of a name, and one or many geospatial specifiers. This tells us that we really should be storing them in a single table. The key here is to think of the data more abstractly,

So your address table needs a 1-many relationship to another table, called address_entities which is as follows:

int ID
varchar() name
varchar() type
int parentID
geography position.
int parentID

This means that you will obviously need a table to link the address to the address entity table above. Now, each geospatial entity is inherently hierarchical, and whilst it makes the SQL harder, and personally I try to avoid self referencing tables there are times when it is a good solution and this is one of them.

The benefits are huge, even though it makes the code harder, it is worth it in the long run.

Also, even when it isn't an immediate requirement, think globally, not all addresses in the world have a street, or state, for example,in france a valid address could be

- la Maison des Fou
- 24500 Eymet

So, bear that in mind when designing schemas.

Richard Harrison 2008-11-22 00:59:13

ansaurus

tags:

views:

answers:

How to best design address locations in any SQL Database?

related questions