views:

27

answers:

1

Geo entities are: Continent, Country, Province, City, Neighborhood. By default Continent, Country and City will always exist. Province is optional as not all countries have state/province. And neighborhood data takes time to find, so it is a load as the data comes in.

Other entity: Zipcode (mapping to City(required) and Neighborhood (optional)). Latitude / Longitude (mapping to city, Neighborhood, Local business) Company, Brand, Local business (company maps to city via Headquarters field and corporate locations field. Brand maps through local business. If local store exists in city then brand exists in the city. And local store maps to city as required and neighborhood optional. Also need to relate company, brand and local stores to country and a global level too.

All these data will be lookups. So what is the best way to make the schema for this so 1) I can capture all relationships, 2) Joins are all small as there are tons of other tables and live feeds data - that means I need to keep relationships denormialized? 3) Ensure data updates are easy to insert new data and if wrong relationships were created it is easy to change it?

So should each entity gets its own tale or throw them all into 1-2 tables? Each of these entities except lat/longitue will be search-able on site and are part of filter metricsfor analytics.

EDIT: I must add this is a "Geographic" based social network, connecting people with their local cities, hence heavy emphasis on connecting all system objects with location at different levels.

+1  A: 

I would start with a normalised database schema, use that to get your thinking straight. Initially do a few experiments with the normalised database adding indexes as required to make queries perform. De-normalise as a last resort.

Here you seem to have the Company, Brand and LocalBusiness entities pretty well defined, you surely need those tables.

You then seem to have a Location, which is a partially specified Geographic concept. It seems like Location is a one-many relationship to LocalBusiness - there can (rarely, but there physically can) be many businesses at the exact same ZipCode or Lat/Long.

So I would have a Location entity with many nullable fields, including CityId, NeighbourhoodId, Zip, Lat/Long. I think only CityId and Zip are not nullable.

City and Neighbourhood need to be treated cleverly - I think in Location you havea key to a more complex entity. So CityId takes us to City, which include Province, state etc. NeighbourhoodId takes to a Neighbourhood. I suspect that Neighbourhood is a complex concept. I live in South JavaVille, which is a sub-neighbourhood of JavaVille which is part of the London Borough of OobleDon, and in the administrative distric of Middlesex.

djna
Well the problem with normalized schema is how much to do it? I did not mention the other relationships like Company belongs to a sub-industry which belongs to an industry which belongs to a sector which has other diffeent child and grand children flows of data. Since everything on system is going to be reportable, searchable and used via autosuggest feature, th question goes ho many different lookup tables then should i have to create for each relationship here. There will be tons of these lookup tables all over then?
kino
I am modeling location around the "Sector-Industry-Company-Brand-Local store" entities. So scaling location from Continent, Country, (State/Province), City, (Neighbourhood/locality) + Zip. Latitude/longitude is only for geocoding purposes later. I wont be touching aministrative districtis, regions or countys. That is too much to model and connect every bit of information esp when each country is different as this schema needs to be global for all countries.
kino