views:

123

answers:

4

How it is better to organise a large database of addresses?

It is need to create mysql database of addresses. How it is better for organising? I have two variants: 1) cuontries

id|name
1 |Russia

cities
id|name
1 |Moscow
2 |Saratov

villages
id|name

streets
id|name
1 |Lenin st.

places
id|name            |country_id|city_id|village_id|street_id|building_number|office|flat_number|room_number
1 |somebuilding    |1         |1      |NULL      |1        |31             |12a   |NULL       |NULL

For simplification I use not all making addresses. If any part does not participate in the address it is equal NULL

2)

addressElements
id|name
1 |country
2 |city
3 |village
4 |street
5 |office
6 |flat_number
7 |room_number

addressValues
id|addressElement_id|value
1 |1                |Russia
2 |2                |Saratov
3 |2                |Moscow
4 |3                |Prostokvashino
5 |4                |Lenin st.

places
id| name
1 | somebuilding

places_has_addressValues
place_id|addressValue_id
1       |1
1       |3 
1       |5

UPD. I have decided to make as follows alt text

I use property "type" (ENUM) for declaring type of object. For example, apartment type may be 'flat' or 'room' or 'office'

+1  A: 

I'd say go with the first one. A place is in one country, one city (or maybe no city), one village (or maybe no village), etc. The second database would give you the flexibility to have a place in multiple countries, on multiple streets, etc. But since that doesn't make any sense for addresses, it just adds a needless layer of complexity, and more joins every time you want to look up an address.

Vanessa MacDougal
+2  A: 

The reasons why I'd choose the first option:

  • Simplicity. The first option is about a dozen times more clear and obvious. Think about other people maintaining your code later.
  • Joins. As Vanessa mentioned, writing joins would be much harder with the second option.

The disadvantage of the first option is probably more rigid structure, so you have to spend some time to be sure you've got all possible options fitting your data model (e.g. what about P.O. box?). You might want to visit your local post office and have a chat. Maybe http://www.russianpost.ru/ would have some useful info?

mindas
+1  A: 

The second option will be a performance killer, don't do it (EAV tables are high on my list of DO NOT DO right up there with cursors). It will also be hard to query to get the address as you don't know how many elements a particular address has.

How are you going to use this? Are you really going to make people look up every possible street address to choose the right one for the record? WHy not simply have them type in the address? I would not use the streets table if you are going to be doing data entry from a GUI and not a bulk import. I kind of feel the same way about the cities table, how many entries will it have? It's nice to try to keep to cities that must exist for that country, but is it practical when entering data? What do you do when someone needs to enter a city that isn't in the cities table? Cites do occasionally change names and the data entry might have the newer value that hasn't made it into the cities table yet.

HLGEM
Also, normalization should take into account how frequently the data will be reused. If you normalize out the street address, you will have a very low reuse rate. Conversely, you would expect to have much higher reuse rates for city, country, etc. But how much space are you saving vs. having another table, indexes, extra joins, etc. If you have roughly 1:1 correlation between two tables, they need to be merged.
GalacticCowboy
A: 

1) Village and City are equivalent - ie, either a city or village, but not both.

2) A properly normalized db would have:

table countries
  ( country_id int primary key, country_name varchar(100) );

table cities  # also villages
  ( city_id int primary key, city_country_id int, city_name varchar(200),
    city_is_village bool );

table streets
  ( street_id int primary key, street_city_id int, street_name varchar(200) );

table addresses # we'll stop normalizing here
  ( addr_id int primary key, addr_street_id int, addr_building_number int, 
    addr_office int, addr_flat_number int, addr_room_number int );

# then to get the entire address, you would join the tables

 SELECT addr_room_number, addr_flat_number, addr_office, addr_building_number, 
        street_name, city_name, city_is_village, country_name
    FROM addresses
    LEFT join streets ON street_id = addr_street_id
    LEFT join cities ON city_id = street_city_id
    LEFT join countries ON country_id = city_country_id
    ;
dar7yl