views:

223

answers:

5

in my mysql database i've got the geonames database, containing all countries, states and cities.

i am using this to create a cascading menu so the user could select where he is from: country -> state -> county -> city.

but the main problem is that the query will search through all the 7 millions rows in that table each time i want to get the list of children rows, and that is taking a while 10-15 seconds.

i wonder how i could speed this up: caching? table views? reorganizing table structure somehow?

and most important, how do i do these things? are there good tutorials you could link to me?

i appreciate all help and feedback discussing smart ways of handling this issue!

UPDATE: here is my table structure:

CREATE TABLE `geonames_copy` (
  `geoname_id` mediumint(9) NOT NULL,
  `parent_id` mediumint(9) DEFAULT NULL,
  `name` varchar(200) DEFAULT NULL,
  `ascii_name` varchar(200) DEFAULT NULL,
  `alternate_names` varchar(4000) DEFAULT NULL,
  `latitude` decimal(10,7) DEFAULT NULL,
  `longitude` decimal(10,7) DEFAULT NULL,
  `feature_class` char(1) DEFAULT NULL,
  `feature_code` varchar(10) DEFAULT NULL,
  `country_code` varchar(2) DEFAULT NULL,
  `cc2` varchar(60) DEFAULT NULL,
  `admin1_code` varchar(20) DEFAULT NULL,
  `admin2_code` varchar(80) DEFAULT NULL,
  `admin3_code` varchar(20) DEFAULT NULL,
  `admin4_code` varchar(20) DEFAULT NULL,
  `population` bigint(20) DEFAULT NULL,
  `elevation` int(11) DEFAULT NULL,
  `gtopo30` smallint(6) DEFAULT NULL,
  `time_zone` varchar(40) DEFAULT NULL,
  `modification_date` date DEFAULT NULL,
  PRIMARY KEY (`geoname_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

and here is the sql query:

            $query = "SELECT geoname_id, name
                    FROM geonames
                    WHERE parent_id = '$geoname_id'
                    AND (feature_class = 'A')";

should i just create index for 2 columns: parent_id and feature_class?

one question: isn´t it better to create an index with solr instead of using mysql? one benefit is that im already using solr and another is that it supports full text search. so maybe it's better so i dont use both solr and mysql (2 things to be good at)?

A: 

Post your SQL for a better reply, but in general:

  • Make indexes on fields that you do joins/wheres on.
  • Do not use "SELECT *" -- only select the fields you need.
  • Hydrate as arrays instead of objects.

Also, if the menu never changes, cache the HTML in a file. You could even only cache the country/state HTML, then fetch cities via AJAX if they change often.

Coronatus
yes, that was smart to have the countries->states in the html. but then the cities will still take a long time to fetch. is there a way of speeding this up? what is index about? could you tell me some more about it?
never_had_a_name
The MySQL manual is your friend. So is posting your code/SQL so we aren't shooting blind.
Coronatus
plz read my update. ive posted the sql structure.
never_had_a_name
A: 

I believe stuff like this is usually done with AJAX. At the beginning, you only load the country names, and after one is selected, you dynamically load the state names in that country, then repeat for each subdivision after that.

Lotus Notes
yes, that is how i intended to do. but the main problem is still that each SELECT query will take long time cause it will have to go through 7 millions rows in that table. i wonder how i could speed this up.
never_had_a_name
A: 

This is a good scenario for partitioning a table and even having sub-partitions. You could partition the table by country, and then sub-partitioning by state. This will significantly reduce the amount of data your query will have to search through as huge segments of data can be removed from the execution plan.

Here is a good place to start for information on MySQL partitioning.

Along with the partitioning (and even if you choose not to partition), you'll want to create indexes on the columns your searching on as this will further enhance the performance of the queries.

Here is the MySQL documentation on HOW to create indexes, but really the tough part about making indexes is knowing what to index. Typically you'll target the columns that show up in the WHERE clauses in your query or on columns you JOIN on. This is pretty general, and you don't (and in many cases shouldn't) have to index every column in your where clauses, but this is a good place to start. Based on the limited data given in the question, you will most likely want a composite index on country and region in order to speed the selection of the cities. You'll want to use explain plan in order to determine when an index is necessary and whether or not it is actually being used by the query. Do a search on SO for "MySQL indexing" and you'll find more than enough information on the when, where, and hows of indexing tables.

If you haven't already, it will help to normalize your data. For example, if your table currently looks something like:

usa;fl;miami;....
usa;fl;orlando;....

It should be changed to something like:

COUNTRY Table:
--------------
COUNTRY_KEY            1
THREE_LETTER           'usa'
COUNTRY_NAME           'united states'
..OTHER COLUMNS....

REGION Table:
--------------
COUNTRY_KEY            1
REGION_KEY             10
REGION_CODE            'fl'
REGION_NAME            'florida'
..OTHER COLUMNS....

CITY Table:
--------------
REGION_KEY             10
CITY_KEY               20
CITY_NAME              'miami'
LAT                    123.12
LONG                   123.12
..OTHER COLUMNS----

From the standpoint of the UI, you'll want to write it in a manner where you're only populating the data necessary and then generating the other data entry points with the matching criteria. So on initial load, you'll populate the country input with a:

SELECT country_key, three_letter 
FROM COUNTRY 
ORDER BY three_letter;

When the user selects the country they are interested in, then you select out all the regions with that country key.

SELECT region_key, region_code 
FROM REGION WHERE country_key = :input_country_key 
ORDER BY region_code;

So on and so forth until you retrieve the users data.

Hope this helps.

RC
there is a reason why i cant normalize my data after your suggestion. cause some entries wont follow the base structure. some entries (cities) have descendants (areas, suburbs) and so on, so its not following a static structure but a dynamical one. that's why geonames only have one table for them all and each entry is linked to a parent entry by parent_id. plz read my update. ive posted the sql structure. i will take a look on indexes...sounds like a solution.
never_had_a_name
+1  A: 

As mentioned, more info would be helpful (Sql, database structure).

The AJAX suggestion is a good one, though you could also do this without ajax.

Do NOT execute a select at any point that selects all of the data. This will be extremely slow.

First, populate the only list of countries. Allow the user to make a selection from this list. After the user selects a country, either via AJAX, or by refreshing the entire page, populate the list of states for that country only - something like (select state from geonames where country = @country). When the user selects a state, populate the list of counties for that country and state - something like (select country from geonames where country = @country and state = @state). Continue in this manner for the city.

I'm not very familiar with MySql, but in SqlServer I would create an index on (Country, State, County, City) to speed up this set of queries. I'm not sure if MySql would be able to accelerate the entire set of queries with this index or not.

Of course, I'm making some assumptions about how your data is structured here, so this info may or may not be relevant.

Krazzy
Looking at your table structure, an index on (parent_id, feature_class) should do the trick. You may want to verify that the index is being used by viewing the query execution plan if there is any way to do so in MySql. There are trade-offs involved, but I've also noticed on occasion that tacking the fields which I am looking up (if they are not very large) to the end of the index can result in a faster query as all information can be retrieved directly from the index with no lookup back into the table. Test and determine which works best for you.
Krazzy
A: 
ALTER TABLE geonames_copy ADD INDEX (parent_id, feature_class);

should do the trick. An index on just parent_id will probably work fine as well.

Keith Randall
what is the difference between your sql statement and this one: CREATE INDEX index_name ON table_name(columnname1_columnname2)";
never_had_a_name
I believe they are equivalent syntaxes (syntaxi? syntaxen?). CREATE INDEX requires you give a name to your index, which isn't really necessary.
Keith Randall