tags:

views:

204

answers:

8

Below is a PHP array I have, it has Countries list 1 to 228 with name and a number for that country. I used this on an old project where in the mysql DB I saved a user's country as a number and then I could use this array to avoid doing another mysql query on pages.

Now I am doing a different site where performace is important. Would it be best to leave it how I am doing it or to change this and store the actual country name into the DB for each user? Which way would most likey be best performance wise?

$country_array = array("1" => "Afghanistan","2" => "Albania","3" => "Algeria","4" => "American Samoa","5" => "Andorra","6" => "Angola","7" => "Anguilla","8" => "Antarctica","9" => "Antigua and Barbuda","10" => "Argentina","11" => "Armenia","12" => "Aruba","13" => "Australia","14" => "Austria","15" => "Azerbaijan","16" => "Bahamas","17" => "Bahrain","18" => "Bangladesh","19" => "Barbados","20" => "Belarus","21" => "Belgium","22" => "Belize","23" => "Benin","24" => "Bermuda","25" => "Bhutan","26" => "Bolivia","27" => "Bosnia and Herzegowina","28" => "Botswana","29" => "Bouvet Island","30" => "Brazil","31" => "British Indian Ocean Territory","32" => "British Virgin Islands","33" => "Brunei Darussalam","34" => "Bulgaria","35" => "Burkina Faso","36" => "Burundi","37" => "Cambodia","38" => "Cameroon","40" => "Cape Verde","41" => "Cayman Islands","42" => "Central African Republic","43" => "Chad","44" => "Chile","45" => "China","46" => "Christmas Island","47" => "Cocos (Keeling) Islands","48" => "Colombia","49" => "Comoros","50" => "Congo","51" => "Cook Islands","52" => "Costa Rica","53" => "Cote D'ivoire","54" => "Croatia","55" => "Cuba","56" => "Cyprus","57" => "Czech Republic","58" => "Czechoslovakia","59" => "Denmark","60" => "Djibouti","61" => "Dominica","62" => "Dominican Republic","63" => "East Timor","64" => "Ecuador","65" => "Egypt","66" => "El Salvador","67" => "Equatorial Guinea","68" => "Eritrea","69" => "Estonia","70" => "Ethiopia","71" => "Falkland Islands (Malvinas)","72" => "Faroe Islands","73" => "Fiji","74" => "Finland","75" => "France","76" => "France, Metropolitan","77" => "French Guiana","78" => "French Polynesia","79" => "French Southern Territories","80" => "Gabon","81" => "Gambia","82" => "Georgia","83" => "Germany","84" => "Ghana","85" => "Gibraltar","86" => "Greece","87" => "Greenland","88" => "Grenada","89" => "Guadeloupe","90" => "Guam","91" => "Guatemala","92" => "Guinea","93" => "Guinea-Bissau","94" => "Guyana","95" => "Haiti","96" => "Heard and McDonald Islands","97" => "Honduras","98" => "Hong Kong","99" => "Hungary","100" => "Iceland","101" => "India","102" => "Indonesia","103" => "Iraq","104" => "Ireland","105" => "Islamic Republic of Iran","106" => "Israel","107" => "Italy","108" => "Jamaica","109" => "Japan","110" => "Jordan","111" => "Kazakhstan","112" => "Kenya","113" => "Kiribati","114" => "Korea","115" => "Korea, Republic of","116" => "Kuwait","117" => "Kyrgyzstan","118" => "Laos","119" => "Latvia","120" => "Lebanon","121" => "Lesotho","122" => "Liberia","123" => "Libyan Arab Jamahiriya","124" => "Liechtenstein","125" => "Lithuania","126" => "Luxembourg","127" => "Macau","128" => "Macedonia","129" => "Madagascar","130" => "Malawi","131" => "Malaysia","132" => "Maldives","133" => "Mali","134" => "Malta","135" => "Marshall Islands","136" => "Martinique","137" => "Mauritania","138" => "Mauritius","139" => "Mayotte","140" => "Mexico","141" => "Micronesia","142" => "Moldova, Republic of","143" => "Monaco","144" => "Mongolia","145" => "Montserrat","146" => "Morocco","147" => "Mozambique","148" => "Myanmar","149" => "Namibia","150" => "Nauru","151" => "Nepal","152" => "Netherlands","153" => "Netherlands Antilles","154" => "New Caledonia","155" => "New Zealand","156" => "Nicaragua","157" => "Niger","158" => "Nigeria","159" => "Niue","160" => "Norfolk Island","161" => "Northern Mariana Islands","162" => "Norway","163" => "Oman","164" => "Pakistan","165" => "Palau","166" => "Panama","167" => "Papua New Guinea","168" => "Paraguay","169" => "Peru","170" => "Philippines","171" => "Pitcairn","172" => "Poland","173" => "Portugal","174" => "Puerto Rico","175" => "Qatar","176" => "Reunion","177" => "Romania","178" => "Russian Federation","179" => "Rwanda","180" => "Saint Lucia","181" => "Samoa","182" => "San Marino","183" => "Sao Tome and Principe","184" => "Saudi Arabia","185" => "Senegal","186" => "Seychelles","187" => "Sierra Leone","188" => "Singapore","189" => "Slovakia","190" => "Slovenia","191" => "Solomon Islands","192" => "Somalia","193" => "South Africa","194" => "Spain","195" => "Sri Lanka","196" => "St. Helena","197" => "St. Kitts And Nevis","198" => "St. Pierre and Miquelon","199" => "St. Vincent And The Greadines","200" => "Sudan","201" => "Suriname","202" => "Svalbard and Jan Mayen Islands","203" => "Swaziland","204" => "Sweden","205" => "Switzerland","206" => "Syrian Arab Republic","207" => "Taiwan","208" => "Tajikistan","209" => "Tanzania, United Republic of","210" => "Thailand","211" => "Togo","212" => "Tokelau","213" => "Tonga","214" => "Trinidad and Tobago","215" => "Tunisia","216" => "Turkey","217" => "Turkmenistan","218" => "Turks and Caicos Islands","219" => "Tuvalu","220" => "Uganda","221" => "Ukraine","222" => "United Arab Emirates","225" => "United States Virgin Islands","226" => "Uruguay","227" => "Uzbekistan","228" => "Vanuatu","229" => "Vatican City State","230" => "Venezuela","231" => "Viet Nam","232" => "Wallis And Futuna Islands","233" => "Western Sahara","234" => "Yemen","235" => "Yugoslavia","236" => "Zaire","237" => "Zambia","238" => "Zimbabwe");
A: 

Usually, comparing integer values is more efficient than comparing strings.

zipcodeman
+3  A: 

It would be fastest to keep a table called countries and a field on your users table called country or country_id that holds the foreign key for each users country in the countries table.

A join in MYSQL is normally (almost always) faster than iterating through a list of returned records and pairing the country with the user.

Doug Neiner
This would not only be the fastest, but also the best solution. The number of countries in our world is fixed, and usually it stays as this ;)
frunsi
Actually it'd be pretty fast to avoid the join altogether and just dump the strings in. I agree with your approach, but it's not the fastest (at least for non-country-dependent-record-retrieval).
Mark E
you could also then store the db data in something like memcache and it would pull from a warm cache (ram) instead of doing another query
Tom Schlick
@Mark, true. But if if he didn't need the country for a specific query, he shouldn't run the join. @trs21219 Yes. If he were using Rails, I would say use the local hash. Rails loads it once upon startup and keeps it in memory if you set it up right.
Doug Neiner
@Doug, the point is that the straight retrieval case "get me all users and their countries" costs a join in this solution. The only savings of using numbers would be "get me all users from country X" where the number X is known a priori (if it's not you suffer the join *and* the string compare)
Mark E
@Mark again, excellent points. But the string compare will only be run against a total of 228 items. In a straight string solution with `20,000` users, you would be comparing against 19772 more strings than you need to. Providing the join is written correctly :)
Doug Neiner
A: 

It seems to me that putting the names of the countries in the table with the usernames will save you memory performance as you won't be storing the names of the countries in an array. But you won't save much else as you're making the same number of queries as you are with the original way except without the query to get the country names.

indyK1ng
A: 

I'm not very familiar with PHP, but couldn't you store the country list in the DB (with a FK in your users table to this list), store the list in the cache, and update the cache whenever something changes? It seems like with all the instability in the world, you'd like to be able to update this list somewhat frequently, and hard coding it like this will require you to change your code every time there's a civil war.

ristonj
A: 

For initial load of the data, you could use the ISO-3166 data. Most Linux distros come with a package (iso-codes) containing a list of all the countries and their names translated to many languages. In several formats too (CSV, XML, and .po for the translations).

http://pkg-isocodes.alioth.debian.org/

Nicolás
A: 

The keys should be integers, not strings, other than that it's fine.

gdonald
+2  A: 

Edit: on second pass, consider all of what I said below about strings, but opt for an enum field. Create an enum in the database containing all countries, and use that rather than strings or another table. It should give you all of the benefits of using strings (like getting the value you want back without a join) and all of the benefits of using a second table.


Assuming that your primary use case is something to the effect of "get me all users and country" or "get me user N and country" then the fastest you can do is to store the string.

This has some pitfalls: string compares are more costly than say an indexed approach (discussed in another answer), and if you plan on often running something to the effect of "get me all users from country X" where you know the index well then you can avoid the string compare.

Even if the latter case exists, it really only pays to have another table if the latter case dominates the usage, the data is going to be changing (you plan on adding new countries later, updating the names of the countries), if you want to change the language of the name of the country to change based on some user locale, etc.

Avoiding joins will save you some execution time, but a second table could also be right for you.

Mark E
+1 Great detail, great answer -- I still think normalization wins out, but I see lots of room for this working as well.
Doug Neiner
Doug, I don't disagree, but having written my share of databases for user profiles I can't recall the last time I wanted to get users by country =). Perhaps an enum would be the most efficient solution?
Mark E
LOL... you are right. I am discussing in theory. It is much different in practice.
Doug Neiner
I didn't even know you could do an enum with that many possible values! I usually do an enum with like 3-4 values, doing one with 200+ won't hurt performance?
jasondavis
jasondavis, I can't say I have any experience with enums this large, but they amount to restricted string sets, so my expectation is that the performance is better than the join alternative in the 90% case and comparable (or better) when you're using it as a query input.
Mark E
+2  A: 

To improve speed create a table with MEMORY storage engine , once the database starts fill the table with your values,it is extra fast.

Also access to your array is very fast, because you are not searching the array, you know the position (with int keys) and just seek there.

amir beygi