views:

89

answers:

3

I would like to design a database that is accessed through a very slow network link. And luckily the database itself is pretty static. So I'm going to use aggressive caching of the results. From time to time, other insertion and updates may happen on tables while the client is running, so I would like to design a low-bandwidth system where the client knows exactly when something has been updated to avoid even bothering checking the remote database.

My idea was to create a special table with two columns, one the name of the table, and another, a version number. This table would never be modified directly by application queries. It would be updated with a stored procedure. Whenever any table is modified, the stored procedure should increment the number of this special table.

The client can then store the results of the queries in a local database (say sqlite) along with the version number of the table. Then, next time runs a select on the special table and checks if the tables have changed.

How does this sound? Are there other strategies to minimize redundant database bandwidth and aggresively cache the database? The thing is going to be not only cross platform but different clients in programming languages will access it (C, C++, Obj-C, Python, etc) so I'm trying to find most simple thing to work in all cases.

Ideally I would like to design the tables to be incremental (deletes are actual inserts), so I could just query the highest ID of the table and compare to the local one. However, I don't know where to look for examples of this. Reading CouchDB's technical page makes my head explode.

A: 

I think it could work at first. You could also group data in your tables, so that your frequently changing data is not in the same tables as your near constant data ...

But a database is also used to run complex queries, with many joins. You would lose that possibility for all your server codes, that would be a heavy cost! Or you could implement that checking, but it would be very complex, and maybe somehow slow also...


Wouldn't it be easier if you would keep a local copy of your database, synchronized with the distant one. You could still use complex queries locally.

And the synchronisation between the databases may exists already with your database vendor. At least, it is a general problem now, so you can look for good existing solutions :-)


Otherwise, you would do your own implementation for synchronization.

You could have a dedicated Version table with the last version number per table, and a column refering to the data version of each row for each table. To synchronize, you would:

  • query the distant Version table, compare it to your local Version table, and determine what tables to query, with the range of versions of interest for each table.
  • for each table to query, query to get all rows whose version number is in the adequate range, and insert or update the rows in the local database.
KLE
+1  A: 

If you expect to have lots of clients accessing this database, written in lots of different languages, perhaps you want to write a thin server layer on top of the database that your clients can connect to. This server could handle the caching, and maybe gzip the data it is sending over the wire. Then your client could just send a message asking for the latest data since time X, and the server could return either just the needed data, or a message saying "no changes since time X"

Peter Recore
+1 good answer, if there is a common language for this server module. But for caching to be efficient, the cache need to be on the server itself, so the distant communication is with the database. Can the zipping occur on that communication?
KLE
the server layer would run on the same local network as the database. It would then zip data before sending it over the slow network link to the clients. The server module could be written in any language, as it would be talking to the clients via some network protocol, either mid level like tcp/ip, or high level like http or XML-RPC
Peter Recore
A: 

From your description, it sounds unlikely that the database will be accessed from more than 3 or 4 sites. This is the kind of situation database replication (even MySQL supports this) is designed for.

You can have a replication slave in every site the database needs to be accessed from. And write your client software to simply query the local replica. Trust me, databases are much easier to work with when you don't have to re-implement low level features like replication.

Seun Osewa