views:

687

answers:

6

Hi folks,

Overview

I'm working on some Emergency Services reporting and mapping application for California (kind of weird, considering the fires there, right now...). We need to map demographic and emergency data for an internal govt unit.

What we have are all the streets, cities and neighborhoods in California. Each neighborhood also has it's relevant shapefile (lat long that defines it's boundaries). This was given to us by the US Census board (all public domain stuff) website.

Problem

I'm not sure how to best design the DB tables. We haven't been told what type of DB we need to use .. so we're open to suggestions if that helps. We have experience with MS SQL 2005 and 2008 (and the spatial stuff in '08).

We can have the following legit data.

  • Street, City, State
  • City, State
  • Neighborhood, State
  • State

The reason why State is a legit location is because we're told this might be sold to other states, so we need to plan for that now.

So, originally, i thought of this...

  • LocationId INTEGER PK Identity
  • Street NVARCHAR(100)
  • Neighbourhood NVARCHAR(100)
  • City NVARCHAR(100)
  • State NVARCHAR(100)
  • Latitude VARCHAR(15)
  • Longitude VARCHAR(15)
  • Shapefile

None of those are nullable, btw. But after a short while, i thought that it was a waste to have so many 'California' text or 'San Diego' text in the fields. So i changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.

So .. that all works fine. except when i try and do some Sql statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries :(

What about having the main table, the sub-lookup tables (eg. Neighbourhoods, Cities and States) linked via ID's and then place all this in a view? Remember, NeighborhoodID and CitiyID would be NULLABLE.. ???

I just want to see people's thoughts on this and the reasons they made their suggestions, please. I'm really worried and confused but are eager to learn.

Please help!


edit 1: I need to stick to an RDBMS Database.

edit 2: I'm thinking about going a single table (de-normalized) with constraints to keep the sum of the fields unqiue OR multi-tables with nullable FK's on the main table (eg. Locations (main table), Neighborhoods, Cities, States ... normalized db schema).

edit 3: Added City to the sample, second list.

edit 4: Added view question.

A: 

This is a nice place to start. A whole #$(#$-load of database schemas to check out:

http://www.databaseanswers.org/data_models/

TAG
Nope - nothing helped me there. Also, i'm after some reasons why suggested designs are suggested. not just answers. We need to learn, here.
Pure.Krome
Eh, this a good list nevertheless.
BobbyShaftoe
+1  A: 

This a problem I've had to deal with and RDBMS systems aren't the best at storing hierarchical data. You might want to look at using an object database since these have to deal with nested objects and are optimized for the problem.

If you need to use an RDBMS, you may have to stick with a de-normalized schema though. Having separate tables to maintain your cities, streets etc may be handy for tracking changes though. If a city or street needs to be renamed, you can update the master record in the respective table and schedule a job to update a text copy of the string in your 'main' table. This will prevent you from having to run updates on 10's 100's of thousands of rows during prime time, but still lets you store the most up-to-data data in the db. Of course, this makes the data duplication situation even worse, but it's the price to pay for performance.

Dana the Sane
I need to stick to an RDMBS. Why would you suggest sticking to a de-normailized schema over a single table?
Pure.Krome
In the app I worked on, we only went down the zip level and the volume of data got a bit large. By de-normalize, I mean a schema with fewer tables, possibly with some data repeated in each. This will limit the number of joins you have to make, but let you stick with some db design principles.
Dana the Sane
I updated my answer to be more specific.
Dana the Sane
+1 I Agree Normalization is ideal, but in the real-world Denormalized data makes life easier at times. However in Pure.Krome's case the data model seems simple enough, and the data limited enough (I guess this is at most a few million entries?) that it might as well be normalized.
Robert Gould
Currently we have 1 1/2 mill, but it could get up to around 10 mill odd?
Pure.Krome
also - i would go either normalized (eg. multi-tables with nullable FK joins) or de-normalized (one table only, with constraints). Discuss!
Pure.Krome
At the end of the day, it might be best to try both. If you start with option 1, switching to 2 is a matter of running a 'SELECT INTO' type query to create the de-normalized table and updating your CRUD sql statements. Let stress testing decide which works best.
Dana the Sane
A: 

The normalized version sounds more correct to me.

What worries you about the queries?

Can't you just create a view that does the joins for you once and then query that?

Andrew Kennan
Correct. i've got a number of outer joins when the data is normalized. Previously in other projects very very similar to this, it's been a pain in the back to get performant queries .. especially when i want to 'bubble up' the heirachy.
Pure.Krome
Lastly, i've been taught to avoid views because i've been told they can be REALLY un-performant, especially on multi-tables which go upto the millions and millions of rows....
Pure.Krome
If you're using MS SQL you have the option of indexed views which may help: http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx
Andrew Kennan
A: 

Is this an OLTP system and reporting system or only a reporting system? If it's only a reporting system, you can denormalize the data in a data warehouse fashion (with snowflake dimensions or not for the hierachies of geographic jurisdictions) and you'll find the reporting to be easier.

I would start from the results and work back, because it sounds to me like you are getting fed the data and you are trying to bring it into a database to support the reporting and mapping. In this case, the database schema being a traditional normalized system is not important because redundancy in the data is not something that will cause maintenance problems for users, etc.

If this seems appropriate, you want to look into the Kimball books.

Cade Roux
+1  A: 

As @Oddthinking noted in a comment, your problems started at:

So I changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.

So .. that all works fine. except when I try and do some SQL statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries.

This reminds me of the "Doctor, doctor, it hurts when I hit myself like this" joke.

Why exactly did you make the foreign key fields nullable? They were mandatory before, so you should keep them as mandatory, precisely to avoid the nightmares of outer join queries.

Your explanation (question) is somewhat confusing in that you list three fields (Neighborhood, City and State) and then say "those two fields are now nullable". Which two are? And why? And what is in the lookup table? Or is there more than one lookup table? There might be an argument for some sort of NeighbourhoodID number which is a foreign key to a Neighbourhood table, which defines the City and State as well as Neighbourhood name. You might then decide that there is a closed list of cities and the cities have an ID number too, and that number determines the state too. You are probably as well off using a two-letter state code as creating a (probably 4-byte) state ID number. However, do not forget that a check criterion that ensures that the state code is one of the 50 or so valid state codes is harder to write than a foreign key that references a table of states. Since neither states nor cities changes very often, I'd probably use the table of states with a foreign key - but the key column would be the state code.

That means you might have a table of Neighbourhoods with columns NeighbourhoodID, Name, CityID; a table of Cities with columns CityID, Name, State; and a table of States with columns State and Name. You can add other columns as you see fit. And your primary table would contain a NeighbourhoodID column that is a foreign key to Neighbourhoods table.

Jonathan Leffler
sort of ... Neighborhoods, Cities and States are their own lookup tables => ID and Description. Simple. In the main table, it references these lookup tables. BUT, some of these references can be null - eg. Neighborhood, State == no street text and no city id - hence the nullable.
Pure.Krome
I still don't get how, in the non-normalized structure, the fields were all not nullable, and yet suddenly become nullable after normalization. What did you have in the not nullable fields when there was no city?
Jonathan Leffler
A: 

Taking the example:

  • Street, City, State
  • City, State
  • Neighborhood, State
  • State

Firstly go back to basic principles, all of the above are distinct geospatial entities, so your address is composed of a name, and one or many geospatial specifiers. This tells us that we really should be storing them in a single table. The key here is to think of the data more abstractly,

So your address table needs a 1-many relationship to another table, called address_entities which is as follows:

  • int ID
  • varchar() name
  • varchar() type
  • int parentID
  • geography position.
  • int parentID

This means that you will obviously need a table to link the address to the address entity table above. Now, each geospatial entity is inherently hierarchical, and whilst it makes the SQL harder, and personally I try to avoid self referencing tables there are times when it is a good solution and this is one of them.

The benefits are huge, even though it makes the code harder, it is worth it in the long run.

Also, even when it isn't an immediate requirement, think globally, not all addresses in the world have a street, or state, for example,in france a valid address could be

- la Maison des Fou
- 24500 Eymet

So, bear that in mind when designing schemas.

Richard Harrison