Strategy for keeping semi-static data fresh

views:

147

answers:

+4 Q:

Strategy for keeping semi-static data fresh

I'm working on a winform app that will be used by groups in the US and overseas that replaces an existing app built with older technologies.

Performance of the old app overseas is pretty slow due to excessive calls back to the database server in the US, so I'd like to cache as much stuff on the client as possible.

What is the best way to cache data for things that don't change very often - but may? How do I keep the data from getting stale without pulling it entirely on each load?

+2 A:

A quick solution would be to create a checksum of some sort for the data, and the client only asks for that at first. If that is the same, then no more data is required, that part is up to date. If the data changes, the sum changes, and the client gets the new sum along with the new data.

Transferring only that every time sounds like it should be less than all of your data.

Iuvat 2009-07-23 15:47:52

That will not work. If the checksums change the data is different, but if the data changes the checksum may not change. The poster needs to know when the data changes, not the checksum.

Dour High Arch 2009-07-26 14:47:08

You might keep a SQL table function or view or something else lightweight which might have a watermark or other indicator for a given set of objects which might report a revision ID or a 0 for unchanged, 1 for changed, and then refresh only those objects which need updating.

Darth Continent 2009-07-23 15:49:14

+1 A:

If your data rarely changes, you can use an sql trigger to set the equivalent of an "updated" flag - and then simply piggyback that flag on any other client-server communication. If you add insert/update/delete triggers to all semi-static tables, and one watchdog table, you should be good to go.

Eamon Nerbonne 2009-07-23 15:57:01

I think the best approach would be to create a separate table to keep track of data. The table would just contain a list of table names, and the last time that table was modified. Then you would create a trigger that would update the fields whenever an insert/update/delete happened on that table. When the client wanted to check for data, you would retrieve the data and get the date from that table and store it. The next time the client needs the data, just look for the table name in that table, get the date and compare it to the one you had. If the one in the database is greater, it's time to fetch from the server.

The only downside to this approach is that for any change in a table, you will go to the database and retrieve all the data again, even if the change doesn't affect your specific SQL query. This shouldn't really matter too much if you data does not change often.

Manuel 2009-07-23 16:56:19

If the application itself can tell whether there's been an update to the data, request delay scaling may be the simplest bet. When the application starts, request the data, then request again after your minimum refresh delay. If the data has not changed, double the refresh delay between requests. Continue doubling the delay when no changes are detected (possibly with a maximum constraint) and you're set to go. This is a common pattern to reduce server load, and is very easy to implement client side without changes to your schema or web service.

2009-07-24 13:23:49

+1 A:

Not sure how the data sets might look but how about calculating a Hash like SHA256 over the data set and storing that hash on the client and the server. Then before a fetch from the server, fetch only the hash and compare with local hash on the client. If the hash match then use the locally stored data set.

Hannes de Jager 2009-07-24 13:43:09

sorry just saw luvat above mentioned a similar solution. I may just add: You can have a table on your server that stores the sql queries that you use along with a hash for that also. So when you do a request, first calculate a hash over the query string and then ask the server for the data hash associated with the query string hash

Hannes de Jager 2009-07-24 13:47:46

You must maintain a timestamp of whenever the database or table you want to cache changes. If there are related tables, or ones which change on a similar timeframe, you can group them into families which share the same timestamp.

Whenever you want data you must first ask for the timestamp. If it hasn't changed, use the cached data. If it has changed, ask for the fresh data, update your cache, and ask for the timestamp of all ancestral data recursively.

A checksum doesn't work because they can collide and you'll have to ask for the fresh data anyway to check for collisions. An old/changed flag won't work because you won't know when the data was changed, it could have been before your previous timestamp. A version ID that always increases after every change and never decreases will also work.

If you're using a RESTful web service the GET HEAD code will return you only the HTTP header, which can include a timestamp. If your web service does not include timestamps you will have to define new ones that return them.

Dour High Arch 2009-07-25 00:18:59

ansaurus

tags:

views:

answers:

Strategy for keeping semi-static data fresh

related questions