views:

1438

answers:

15

When deploying a large Java webapp (>100 MB .war) I'm currently use the following deployment process:

  • The application .war file is expanded locally on the development machine.
  • The expanded application is rsync:ed from the development machine to the live environment.
  • The app server in the live environment is restarted after the rsync. This step is not strictly needed, but I've found that restarting the application server on deployment avoids "java.lang.OutOfMemoryError: PermGen space" due to frequent class loading.

Good things about this approach:

  • The rsync minimizes the amount of data sent from the development machine to the live environment. Uploading the entire .war file takes over ten minutes, whereas an rsync takes a couple of seconds.

Bad things about this approach:

  • While the rsync is running the application context is restarted since the files are updated. Ideally the restart should happen after the rsync is complete, not when it is still running.
  • The app server restart causes roughly two minutes of downtime.

I'd like to find a deployment process with the following properties:

  • Minimal downtime during deployment process.
  • Minimal time spent uploading the data.
  • If the deployment process is app server specific, then the app server must be open-source.

Question:

  • Given the stated requirements, what is the optimal deployment process?
+6  A: 

Rsync tends to be ineffective on compressed files since it's delta-transfer algorithm looks for changes in files and a small change an uncompressed file, can drastically alter the resultant compressed version. For this reason, it might make good sense to rsync an uncompressed war file rather than a compressed version, if network bandwith proves to be a bottleneck.

What's wrong with using the Tomcat manager application to do your deployments? If you don't want to upload the entire war file directly to the Tomcat manager app from a remote location, you could rsync it (uncompressed for reasons mentioned above) to a placeholder location on the production box, repackage it to a war, and then hand it to the manager locally. There exists a nice ant task that ships with Tomcat allowing you to script deployments using the Tomcat manager app.

There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.

Your application will be unavailable to users during a restart, of course. But if you're so concerned about availability, you surely have redundant web servers behind a load balancer. When deploying an updated war file, you could temporarily have the load balancer send all requests to other web servers until the deployment is over. Rinse and repeat for your other web servers.

Asaph
It is my understanding that rsync:ing a zip representation of two similar directories won't give me the same speed benefits as rsync:ing the two directories. Please correct me if I'm mistaken.
knorv
The thing is: a tiny local change in an uncompressed file can lead to very large differences in the compressed file, i.e. rsync will have to transfer more data - if the network bandwidth is the bottleneck, and there are usually small differences in many files, this could lead to an overall slower result.
Michael Borgwardt
@knorv: You might actually be right about that. Although rsync uses a delta-transfer algorithm (http://samba.anu.edu.au/ftp/rsync/rsync.html), compression tends to alter the entire structure of the file which makes rsync's delta-transfer algorithm somewhat ineffective (http://zsync.moria.org.uk/paper200501/ch01s03.html). If do choose to uncompress files before rsyncing, at least use the -z option which tells rsync to compress data before transferring.
Asaph
@Michael Borgwardt: I just researched it further and came to that conclusion too. See my comment to @knorv.
Asaph
Asaph: Sorry, but I still think you're wrong. As I've understood it the delta-transfer algorithm is known to have problems with compressed data. See for example this discussion on this exact topic http://lists.samba.org/archive/rsync/2003-August/007010.html
knorv
My last comment was a response to "It seems to me that rsyncing a zip representation of 2 directories would be more efficient than the directories uncompressed". Had not seen the other comments.
knorv
@knorv: Are you using the -z option with rsync? I think that will help you.
Asaph
@knorv: , @Michael Borgwardt: I've updated my answer to reflect the new things I just learned about rsync and compressed files :)
Asaph
Asaph: Please elaborate. I fail to see how how -z would help me if I'm following your advice on rsync:ing the entire .war file.
knorv
@knorv: I changed my answer. You and Michael Borgwardt convinced me that rsyncing the uncompressed war might be a good idea (if network bandwidth is a bottleneck in the deployment). So given that, you should use rsync -z on the uncompressed files so that rsync will send fewer bytes across the network.
Asaph
Asaph: I'm relying on the built-in compression in ssh (rsync over ssh), so the -z option is not needed.
knorv
@knorv: I think that the way to minimize WAR file (re-)deployment times would be to use `jar -0 ...` to generate the WAR file. This tells it to use no compressions in the ZIP file, and that will allow rsync to produce smaller deltas.
Stephen C
Stephen C: That's smart! Didn't think about that option. Consider posting that in a separate answer.
knorv
@knorv - done ... and I've another idea as well
Stephen C
+1 for solving the downtime by using the network. Yes, it means getting the new version to production will take longer, but it is the only real way to go if minimizing downtime is important. You can even start up the new version as a separate tomcat process on a different port on the same host - then flip the network traffic to go to that port instead, and shut down the old version once its connections are gone. Of course, that doesn't help you in case the process crashes or the box dies.
Zac Thompson
+5  A: 

Can't you make a local copy of the current web application on the web server, rsync to that directory and then perhaps even using symbolic links, in one "go", point Tomcat to a new deployment without much downtime?

jarnbjo
+6  A: 

It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.

On possible solution is to use jar -0 ... to create the original WAR file. The -0 option tells the jar command to not compress the member files when creating the WAR file. Then, when rsync compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ... or a compressed data stream / transport underneath.

EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ... to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.

In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.

The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.


A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib directory; e.g. using rsync.

Stephen C
One *HUGE* problem with moving libraries into a shared directory is if they hold references to objects within the web-app. If that's the case, then they *will* prevent the JVM from reclaiming the space used by the web-app, leading to permgen exhaustion.
kdgregory
But if the shared library does not have statics that hold references to webapp objects, the second approach is OK, right?
Stephen C
Of course. But how do you know? For example, the JDK's Introspector class caches class definitions, which means that if you use it from a web-app, you have to explicitly flush the cache on redeploy. But what if your shared marshalling library uses Introspector under the covers?
kdgregory
"But how do you know?". By manually or automatically inspecting the code. (It would be feasible to write a utility that checked the classes in a JAR file for potentially troublesome statics.)
Stephen C
+1  A: 

I'm not sure if this answers your question, but I'll just share on the deployment process I use or encounter in the few projects I did.

Similiar to you, I do not ever recall making a full war redeployment or update. Most of the time, my updates are restricted to a few jsp files, maybe a library, some class files. I am able to manage and determine which are the affected artifacts, and usually, we packaged those update in a zip file, along with an update script. I will run the update script. The script does the following:

  • Backup the files that will be overwritten, maybe to a folder with today's date and time.
  • Unpackage my files
  • Stop the application server
  • Move the files over
  • Start the application server

If downtime is a concern, and they usually are, my projects are usually HA, even if they are not sharing state but using a router that provide sticky session routing.

Another thing that I am curious would be, why the need to rsync? You should able to know what are the required changes, by determining them on your staging/development environment, not performing delta checks with live. In most cases, you would have to tune your rsync to ignore files anyway, like certain property files that define resources a production server use, like database connection, smtp server, etc.

I hope this is helpful.

Kent Lai
+4  A: 

My advice is to use rsync with exploded versions but deploy a war file.

  1. Create temporary folder in the live environment where you'll have exploded version of webapp.
  2. Rsync exploded versions.
  3. After successfull rsync create a war file in temporary folder in the live environment machine.
  4. Replace old war in the server deploy directory with new one from temporary folder.

Replacing old war with new one is recommended in JBoss container (which is based on Tomcat) beacause it'a atomic and fast operation and it's sure that when deployer will start entire application will be in deployed state.

cetnar
This should avoid what would be my biggest concern with the OP's practice, which is a non-atomic update.
kdgregory
Yeah, exploded versions and hot deployment is good for development mode, but in production it's better to use wars.
cetnar
+1 for the atomic deployment
Pascal Thivent
+2  A: 

Hot Deploy a Java EAR to Minimize or Eliminate Downtime of an Application on a Server or How to “hot” deploy war dependency in Jboss using Jboss Tools Eclipse plugin might have some options for you.

Deploying to a cluster with no downtime is interesting too.

JavaRebel has hot-code deployement too.

elhoim
JavaRebel is now called JRebel
Thierry-Dimitri Roy
+1  A: 

Your approach to rsync the extracted war is pretty good, also the restart since I believe that a production server should not have hot-deployment enabled. So, the only downside is the downtime when you need to restart the server, right?

I assume all state of your application is hold in the database, so you have no problem with some users working on one app server instance while other users are on another app server instance. If so,

Run two app servers: Start up the second app server (which listens on other TCP ports) and deploy your application there. After deployment, update the Apache httpd's configuration (mod_jk or mod_proxy) to point to the second app server. Gracefully restarting the Apache httpd process. This way you will have no downtime and new users and requests are automatically redirected to the new app server.

If you can make use of the app server's clustering and session replication support, it will be even smooth for users which are currently logged in, as the second app server will resync as soon as it starts. Then, when there are no accesses to the first server, shut it down.

mhaller
+1  A: 

If static files are a big part of your big WAR (100Mo is pretty big), then putting them outside the WAR and deploying them on a web server (e.g. Apache) in front of your application server might speed up things. On top of that, Apache usually does a better job at serving static files than a servlet engine does (even if most of them made significant progress in that area).

So, instead of producing a big fat WAR, put it on diet and produce:

  • a big fat ZIP with static files for Apache
  • a less fat WAR for the servlet engine.

Optionally, go further in the process of making the WAR thinner: if possible, deploy Grails and other JARs that don't change frequently (which is likely the case of most of them) at the application server level.

If you succeed in producing a lighter WAR, I wouldn't bother of rsyncing directories rather than archives.

Strengths of this approach:

  1. The static files can be hot "deployed" on Apache (e.g. use a symbolic link pointing on the current directory, unzip the new files, update the symlink and voilà).
  2. The WAR will be thinner and it will take less time to deploy it.

Weakness of this approach:

  1. There is one more server (the web server) so this add (a bit) more complexity.
  2. You'll need to change the build scripts (not a big deal IMO).
  3. You'll need to change the rsync logic.
Pascal Thivent
+2  A: 

In any environment where downtime is a consideration, you are surely running some sort of cluster of servers to increase reliability via redundancy. I'd take a host out of the cluster, update it, and then throw it back into the cluster. If you have an update that cannot run in a mixed environment (incompatible schema change required on the db, for example), you are going to have to take the whole site down, at least for a moment. The trick is to bring up replacement processes before dropping the originals.

Using tomcat as an example - you can use CATALINA_BASE to define a directory where all of tomcat's working directories will be found, separate from the executable code. Every time I deploy software, I deploy to a new base directory so that I can have new code resident on disk next to old code. I can then start up another instance of tomcat which points to the new base directory, get everything started up and running, then swap the old process (port number) with the new one in the load balancer.

If I am concerned about preserving session data across the switch, I can set up my system such that every host has a partner to which it replicates session data. I can drop one of those hosts, update it, bring it back up so that it picks the session data back up, and then switch the two hosts. If I've got multiple pairs in the cluster, I can drop half of all pairs, then do a mass switch, or I can do them a pair at a time, depending upon the requirements of the release, requirements of the enterprise, etc. Personally, however, I prefer to just allow end-users to suffer the very occasional loss of an active session rather than deal with trying to upgrade with sessions intact.

It's all a tradeoff between IT infrastructure, release process complexity, and developer effort. If your cluster is big enough and your desire strong enough, it is easy enough to design a system that can be swapped out with no downtime at all for most updates. Large schema changes often force actual downtime, since updated software usually cannot accommodate the old schema, and you probably cannot get away with copying the data to a new db instance, doing the schema update, and then switching the servers to the new db, since you will have missed any data written to the old after the new db was cloned from it. Of course, if you have resources, you can task developers with modifying the new app to use new table names for all tables that are updated, and you can put triggers in place on the live db which will correctly update the new tables with data as it is written to the old tables by the prior version (or maybe use views to emulate one schema from the other). Bring up your new app servers and swap them into the cluster. There are a ton of games you can play in order to minimize downtime if you have the development resources to build them.

Perhaps the most useful mechanism for reducing downtime during software upgrades is to make sure that your app can function in a read-only mode. That will deliver some necessary functionality to your users but leave you with the ability to make system-wide changes that require database modifications and such. Place your app into read-only mode, then clone the data, update schema, bring up new app servers against new db, then switch the load balancer to use the new app servers. Your only downtime is the time required to switch into read-only mode and the time required to modify the config of your load balancer (most of which can handle it without any downtime whatsoever).

+1  A: 

At what is your PermSpace set? I would expect to see this grow as well but should go down after collection of the old classes? (or does the ClassLoader still sit around?)

Thinking outloud, you could rsync to a separate version- or date-named directory. If the container supports symbolic links, could you SIGSTOP the root process, switch over the context's filesystem root via symbolic link, and then SIGCONT?

Xepoch
+1  A: 

As for the early context restarts. All containers have configuration options to disable auto-redeploy on class file or static resource changes. You probably can't disable auto redeploys on web.xml changes so this file is the last one to update. So if you disable to auto redeploy and update the web.xml as the last one you'll see the context restart after the whole update.

toomasr
+3  A: 

This is dependant on your application architecture.

One of my applications sits behind a load-balancing proxy, where I perform a staggered deployment - effectively eradicating downtime.

Matt
+1. This is the solution we use. With a little bit of intelligence, you can ensure that the cluster of servers running a mix of version N and version N-1 will function correctly. Then just take one of your servers offline, upgrade it, and bring it back online. Run for a while to ensure there's no problem then do the same for each of half the other servers. Run like that for a couple of days so you have a backout position, then convert the rest.
paxdiablo
A: 

are you sure you need a war of such a dimension ?

maybe you have inside your war some static content that could (should) be put elsewhere and deployed separatedly.

naaka
+1  A: 

We upload the new version of the webapp to a separate directory, then either move to swap it out with the running one, or use symlinks. For example, we have a symlink in the tomcat webapps directory named "myapp", which points to the current webapp named "myapp-1.23". We upload the new webapp to "myapp-1.24". When all is ready, stop the server, remove the symlink and make a new one pointing to the new version, then start the server again.

We disable auto-reload on production servers for performance, but even so, having files within the webapp changing in a non-atomic manner can cause issues, as static files or even JSP pages could change in ways that cause broken links or worse.

In practice, the webapps are actually located on a shared storage device, so clustered, load-balanced, and failover servers all have the same code available.

The main drawback for your situation is that the upload will take longer, since your method allows rsync to only transfer modified or added files. You could copy the old webapp folder to the new one first, and rsync to that, if it makes a significant difference, and if it's really an issue.

Kief
A: 

Not a "best practice" but something I just thought of.

How about deploying the webapp through a DVCS such as git?

This way you can let git figure out which files to transfer to the server. You also have a nice way to back out of it if it turns out to be busted, just do a revert!

John Nilsson