views:

78

answers:

3

I'm working on a project in which I'm having slight difficulties coming up with the design for a seemingly very simple scenario:

user belongs to city which belongs to country, however, the city reference may be null while user must belong to a country nevertheless. In other words (in basic RoR model syntax),

# class User < ActiveRecord::Base 
belongs_to :city
belongs_to :country
validates_existence_of :country

# class City < ActiveRecord::Base
has_many :users
belongs_to :country
validates_existence_of :country

# class Country < ActiveRecord::Base
has_many :users
has_many :cities    

My problem with this super simple design is the fact that there is so much redundancy. As soon as a city is referenced by a user, the country reference can be extrapolated from it (in other words, since it is already referenced in the city table, it seems not-so-awesome to also reference it in the user table).

A: 

I have no thoughtful answer, but first comes to my mind is this,

# class User < ActiveRecord::Base 
belongs_to :country, :through => :city
validates_existence_of :city

# class City < ActiveRecord::Base
has_many :users
belongs_to :country
validates_existence_of :country

# class Country < ActiveRecord::Base
has_many :users, :through => :city
has_many :cities

The trick is that a dummy or blank city is added to each country so that the validations hold.

OmniBus
Thanks for the input, OmniBus!
tjko
+5  A: 

This is what happens when A (city) also uniquely identifies B (country), but A is optional while B is mandatory. Basically, Country is only added because City is optional while there is still a need to identify the country of each user.

The idea of tying country and city together, may seem attractive because a city uniquely "identifies" a country, but: does it? Amsterdam is not just a city in the Netherlands you know.

Plus it carries the problem you already mentioned in your comment... what do you do with additional data; and listing countries as such now requires filtering them out of the country/city amalgamation.

Your original design may feel redundant and data-wise it probably is, but logic-wise and requirement-wise it isn't. I would stick with it as it is very clear and reflects the requirements perfectly. And I would learn to live with the apparant redundancy. Any "solution" you may come up with to avoid the "redundancy", will likely just end up muddying the waters. Or will make defining queries in the future more difficult.

Marjan Venema
+1 on this, sometimes normalization can cause a bigger pain than is necessary. The only thing I would add is that you can and probably should do a validation to ensure that the city is in the country selected.
Geoff Lanotte
Marjan, thanks so much for your input (the more I get the better I feel about going with a certain solution). Would you maybe have any recommendations for learning this sort of stuff? Should I just pick up a book on database design or are there any good web resources? The problem about this topic is that I searched for answers to this problem, but really just came up empty handed. I'm worried that all the decisions I make with respect to the design of my database could be much less than perfect...
tjko
@Geoff, right, thanks for supporting this response. And +1 on reminding me to validate the city (I was obviously planning on it, but I easily forget these things developing with the agility of Rails).
tjko
@tjko: Sorry off the top of my head no recommendations for book titles or such. In general: don't get stuck on the database design, stay above it, and think about the things you are modelling. If they are different concepts, keep them separate. Don't optimize unless there is a (proven) need to do so. Other than that: talk a lot with people that have more experience (like here on SO), get their views, even if they are conflicting you will get a feel for the pro's and con's.
Marjan Venema
+1  A: 

This has the 'sql' tag for some reason, so here's how I'd do it in SQL (note there is referential integiry throughout and no NULLable columns):

CREATE TABLE Countries 
(
 country_code CHAR(3) NOT NULL UNIQUE
);

CREATE TABLE Cities 
(
 city_name VARCHAR(20) NOT NULL, 
 country_code CHAR(3) NOT NULL 
    REFERENCES Countries (country_code), 
 UNIQUE (country_code, city_name)
);

CREATE TABLE Users
(
 username CHAR(8) NOT NULL UNIQUE, 
 country_code CHAR(3) NOT NULL, 
 UNIQUE (country_code, username)
);

CREATE TABLE UsersCountries
(
 username CHAR(8) NOT NULL UNIQUE, 
 country_code CHAR(3) NOT NULL, 
 FOREIGN KEY (country_code, username)
    REFERENCES Users (country_code, username), 
 city_name VARCHAR(20) NOT NULL, 
 FOREIGN KEY (country_code, city_name)
    REFERENCES Cities (country_code, city_name)
);

Test data:

INSERT INTO Countries (country_code) VALUES 
('ITL'), 
('ESP');

INSERT INTO Cities (city_name, country_code) 
VALUES 
('Roma', 'ITL'), 
('Naples', 'ITL'), 
('Barcelona', 'ESP'), 
('Madrid', 'ESP');

INSERT INTO Users (username, country_code) VALUES 
('00000001', 'ESP'), 
('00000002', 'ESP'), 
('00000003', 'ITL'), 
('00000004', 'ITL');

INSERT INTO UsersCountries (username, city_name, country_code) 
VALUES 
('00000002', 'Madrid', 'ESP'), 
('00000004', 'Roma', 'ITL');

To be fair, most SQL coders will not have an aversion to using a NULLable column and will prefer all user's details to appear in one table. Assuming your SQL product (correctly) does not treat NULL as a value (for example MS SQL Server does not but MS Access does) then the following will work and is equivalent to the above structure (i.e. again referential integiry throughout despite the existence of NULLable columns):

CREATE TABLE Users
(
 username CHAR(8) NOT NULL UNIQUE, 
 city_name VARCHAR(20), 
 country_code CHAR(3) NOT NULL
    REFERENCES Countries (country_code), 
 FOREIGN KEY (country_code, city_name)
    REFERENCES Cities (country_code, city_name)
);

INSERT INTO Users (username, city_name, country_code) VALUES 
('00000001', NULL, 'ESP'), 
('00000002', 'Madrid', 'ESP'), 
('00000003', NULL, 'ITL'), 
('00000004', 'Roma', 'ITL');
onedaywhen
onedaywhen, thanks for posting a response. I tagged this as SQL because I thought maybe people adept at SQL could provide good insight on database design, but maybe I falsely assumed that, I'm sorry (I'm new here). In any case, thanks for your response, I was able to read into it a bit, but I must say it's bit too low level and doesn't speak that much to the design question I was going for.
tjko
The point is, there is no redudancy in these designs! Sure, country_code appears all all three/four tables but each is required to maintain data integrity. You can't 'extrapolate' a country from a city (e.g. is it Paris, Texas or Paris, France?), hence the compound key.
onedaywhen