views:

49

answers:

1

I have a project that requires us to maintain several MySQL databases on multiple computers. They will have identical schemas.

Periodically, each of those databases must send their contents to a master server, which will aggregate all of the incoming data. The contents should be dumped to a file that can be carried via flash drive to an internet-enabled computer to send.

Keys will be namespace'd, so there shouldn't be any conflict there, but I'm not totally sure of an elegant way to design this. I'm thinking of timestamping every row and running the query "SELECT * FROM [table] WHERE timestamp > last_backup_time" on each table, then dumping this to a file and bulk-loading it at the master server.

The distributed computers will NOT have internet access. We're in a very rural part of a 3rd-world country.

Any suggestions?

A: 

Your

SELECT * FROM [table] WHERE timestamp > last_backup_time

will miss DELETEed rows.

What you probably want to do is use MySQL replication via USB stick. That is, enable the binlog on your source servers and make sure the binlog is not thrown away automatically. Copy the binlog files to USB stick, then PURGE MASTER LOGS TO ... to erase them on the source server.

On the aggregation server, turn the binlog into an executeable script using the mysqlbinlog command, then import that data as an SQL script.

The aggregation server must have a copy of each source servers database, but can have that under a different schema name as long as your SQL all does use unqualified table names (does never use schema.table syntax to refer to a table). The import of the mysqlbinlog generated script (with a proper USE command prefixed) will then mirror the source servers changes on the aggregation server.

Aggregation across all databases can then be done using fully qualified table names (i.e. using schema.table syntax in JOINs or INSERT ... SELECT statements).

Isotopp
that's really slick. I'll look into this. Thanks!