views:

62

answers:

2

Currently my program updates itself by downloading the latest .tar.gz file containing the source code, and extracting it over the current directory where the program lives. There are 2 "modes" of update - one for users running the Python source, and one if the user is running the program as a Windows exe.

Over time my program's filesize is becoming larger with each release, due to new images, libraries, documentation and code. However, sometimes only code changes occur from one release to another, so the user ends up re-downloading all the images, documentation etc over and over, when only there are only small code changes.

I was thinking that a more efficient approach would be to use a patch/diff based system where the program incrementally updates itself from one version to another by only downloading small change sets.

However, how should I do this? If the user is running version 0.38, and there is a 0.42 available, do they download 0.38->39; 0.39->40; 0.40->41, 0.41->42? How would I handle differences in binary files? (images, in my case).

I'd also have to maintain some repository containing all the patches, which isn't too bad. I'd just generate the diffs with each new release. But I guess it would be harder to do this to executables than to pure python code?

Any input is appreciated. Many thanks.

+1  A: 

Your update manager can know which version the current app is, and which version is the most recent one and apply only the relevant patches.

Suppose the user runs 0.38, and currently there is 0.42 available. The update for 0.42 contains patches for 0.39, 0.40, 0.41 and 0.42 (and probably farther down the history). The update manager downloads the 0.42 update, knows it's at 0.38 and applies all the relevant patches. If it currently runs 0.41, it only applies the latest patch, and so on.

Eli Bendersky
+2  A: 

I suggest that rather than reinventing your own update management system, you take a look at open source options, such as google updater (which was open sourced over a year ago as Omaha) -- I imagine the Windows focus is OK since you do specifically refer to Windows, but if you also need Mac support a similar functionality is offered in update engine (for Linux you probably want to work with the specific distribution's package management system rather than using any add-on one).

As you'll see in the omaha overview, the focus is not specifically on determining and applying "deltas" rather than full updates, but on automating the process for the user's convenience (and security, when updates address potential security issues). As for the differences, I would suggest behaving similar to version control systems like subversion (indeed, you can no doubt reuse much of svn's code) -- only text files are differenced, binary files' "differences" are all-or-nothing (for most binary file formats there's just too little gain -- if any -- in trying to send less than the whole new file, if changed at all; for images in particular, and more generally compressed files of all kinds, it's typical that a tiny change in the underlying content can produce huge changes in the resulting file).

If you think some or all of your binary files might actually benefit from the approach of using differences and incremental patches, rather than all or nothing file-by-file replacement, I would suggest you first experiment with a specialized utility such as jojodiff to verify -- and if that is indeed the case (perhaps only for some files, while others might as well be replaced entirely), you might package the patch part of it with your updater (and run it as a subprocess from Python, etc).

As for maintaining deltas on your server, a mixed approach should work: i.e., you'd try to keep all the (quadratic numbers of) updates (from A → A+1, A → A+2, A+1 → A+2, etc) but "cut off" each branch (in favor of a total-replacement approach) when the advantage of doing things incrementally becomes too small to warrant the cost of taking up storage on your server and processing time at the client (of course, there's nothing but heuristics, aka try/experiment and see, for determining the threshold for "too small";-).

Alex Martelli
Would not it be a simple matter of keeping the program in a Mercurial repository?
jsbueno
@jsbueno, sure, you _could_ reuse a vcs (hg, but also svn, git, bazaar, ...) at a higher abstraction level (rather than just copying and repurposing bits and pieces), but a full client install side by side with your app (depending on app's size) might be a large footprint compared with the app itself. (distributed vcrs here don't have the usual advantages compared to classic ones like svn, since you surely do want to use your server as the "master" tree, don't care to allow local branching / multiple heads, etc, etc).
Alex Martelli