views:

105

answers:

3

What is the best way to design the schema for the following requirements?

Need to store Countries, States, Counties, and counties can be divided into regions. Regions can then have people with various data points to report on.

Regions can also be further divided into divisions which is like grouping people. So Region 1 can have Division A, Division B with people in each of these divisions. Regions and Divisions, each have different set of meta data that are different from Countries, States and Counties.

Reports will be on the data associated with people and similar to google analytics with drilling down from Country all the way down to regions and divisions.

Note: Regions can have 10 peopel and 1 division with 4 people and remaining 6 people are not tied to any division.

+1  A: 

Off the top of my head:

  • countries,States,counties,cities all have fk to regions.
  • states has fk to countries
  • counties has fk to states
  • cities has fk to counties
  • regions has fk to divisions
  • people has fk to divisions_people
  • divisions_people has fk to people and divisions
  • divisions has fk to divisions_people

    country <- state <- county <- city
      ^          ^         ^      ^
       \          \        /     /
                   regions
                      ^
                      |
                  divisions
                      ^
                      |   
                     \|/
                divisions_people (1 person in multiple divisions)
                      ^
                      |
                    people
    
Byron Whitlock
+1 for the ASCII art!
Jonathan Leffler
A: 

A table for countries: country_id, country_name, population

A table for states: state_id, state_name, country_id, population

A table for counties: county_id, county_name, state_id, population

A table for regions: region_id, region_name, county_id, population

A table for divisions: division_id, division_name, region_id, population

Validate in your code or through a triggered constraint (depending on your RDBMS) that you don't have a division with 300 people inside a region with only 50 people. To have people in a region without being in a division, your region population would be 500 while the sum of its divisions would be only 450 (leaving you 50 people in a region but divisionless).

Rob
+1  A: 

It sounds like each person can have one and only one region.

If you are doing transactional processing (as opposed to data mining/warehousing), then I would link the person to the region with a RegionID foreign key.

As far as the (optional) divisions, you can either link the person to a division with a link table: PersonID, DivisionID or if you don't mind NULL DivisionID, you can have a foreign key.

As far as the hierarchy for the geographic regions, I would hesitate to model this until I knew more about the limitations across countries and what these structures represent. While it would be nice to think that everything always rolls up to the next level, I've dealt a lot with hierarchies where levels are skipped and these are modeled very differently. Plus, many countries like the United Kingdom typically would not have States (unless you are going to use England, Scotland, Wales, and Northern Ireland). France is even more complex.

For the reporting/rollup aspect (or if you are only doing data mining/warehousing), I would transform to a separate dimensional model, which would "lock in" the other things as attributes and make it far easier to do the roll ups. So the star schema would lock in dimensional IDs for the different levels to the facts.

Cade Roux