views:

104

answers:

4

I have two tables on different servers, and I'd like some help finding an efficient way to combine and match the datasets. Here's an example:

From server 1, which holds our stories, I perform a query like:

query = """SELECT author_id, title, text
           FROM stories
           ORDER BY timestamp_created DESC
           LIMIT 10
           """
results = DB.getAll(query)

for i in range(len(results)):
    #Build a string of author_ids, e.g. '1314,4134,2624,2342'

But, I'd like to fetch some info about each author_id from server 2:

query = """SELECT id, avatar_url
           FROM members
           WHERE id IN (%s)
           """
values = (uid_list)
results = DB.getAll(query, values)

Now I need some way to combine these two queries so I have a dict that has the story as well as avatar_url and member_id.

If this data were on one server, it would be a simple join that would look like:

SELECT *
FROM members, stories
WHERE members.id = stories.author_id

But since we store the data on multiple servers, this is not possible.

What is the most efficient way to do this? I understand the merging probably has to happen in my application code ... any efficient sample code that minimizes the number of dict loops would be greatly appreciated!

Thanks.

A: 

You will have to bring the data together somehow.

  • There are things like server links (though that is probably not the correct term in mysql context) that might allow querying accross different DBs. This opens up another set of problems (security!)

  • The easier solution is to bring the data together in one DB.

  • The last (least desirable) solution is to join in code as Padmarag suggests.

lexu
A: 

The only option looks to be Database Link, but is unfortunately unavailable in MySQL.
You'll have to do the merging in your application code. Better to keep the data in same database.

Padmarag
Hi -- I agree, I have to do it in my application code. I'm looking for the most efficient way to do it in my application. Any sample code would be greatly appreciated.
ensnare
Doing it in your code means you need to compare to resultsets entry by entry writing the combined data into a third data structure (e.g. dictionary). This is very domain specific, you're mostly on your own I'm afraid.
lexu
+2  A: 

If memory isn't a problem, you could use a dictionary.

results1_dict = dict((row[0], list(row[1:])) for row in results1)
results2_dict = dict((row[0], list(row[1:])) for row in results2)

for key, value in results2_dict:
    if key in results1_dict:
        results1_dict[key].extend(value)
    else:
        results1_dict[key] = value

This isn't particularly efficient (n2), but it is relatively simple and you can tweak it to do precisely what you need.

Colorado
A: 

Is it possible to setup replication of the needed tables from one server to a database on the other? That way you could have all your data on one server.

Also, see FEDERATED storage engine, available since mysql 5.0.3.

ceteras