views:

22

answers:

1

I am working on a project that obtains values from many measurement stations (e.g. 50000) located all over the world. I have 2 databases, one storing information on the measurement stations, the other one storing values obtained from these stations (e.g. several million). A super-simplified version of the database structure could look like this:

database measurement_stations

    table measurement_station
    id      : primary key
    name    : colloquial station name
    country : foreign key into table country

    table country
    id      : primary key
    name    : name of the country

database measurement_values

    table measurement_value
    id      : primary key
    station : id of the station the value came from
    value   : measured value

I need a list of the names of all countries from the first database for which values exist in the second database. I am using MySQL with InnoDB, so cross-database foreign are supported.

I am lost on the SELECT statement, more specifically, the where clause.

Selecting the IDs of the countries for which values exist seems easy:

SELECT DISTINCT id FROM measurement_values.measurement_value

This takes a couple of minutes the first time, but is really fast in subsequent calls, even after database server restarts; I assume that's normal.

I think the COUNT trick mentioned in http://stackoverflow.com/questions/3752809/problem-with-query-data-in-a-table and http://stackoverflow.com/questions/3750651/mysql-complex-where-clause could help, but I can't seem to get it right.

SELECT country.name FROM measurement_stations WHERE country.id = measurement_station.id
AND (id is in the result of the previous SELECT statement)

Can anyone help me ?

A: 

This should do it:

select distinct m.country, ct.name
from measurement_stations..measurement_station m
inner join measurement_values..measurement_value mv on mv.station = m.id
inner join measurement_stations..country ct on ct.id = m.country
codingguy3000
that works, excellent. thank you very much for the quick help! :-)
ssc
first time I run the query: 184 rows in set (27 min 10.41 sec)
ssc
second time I run it: 184 rows in set (1 min 20.92 sec)
ssc
Relational Database Management Systems (SQL Server, MySQL, Oracle, etc) will cache a query so that the second time it runs performance is improved. This can cause issues in a production environment. You should look at adding indexes to fix the performance issue with the initial query run.
codingguy3000