tags:

views:

302

answers:

4

Hi SO gurus!

I am currently on a short research project. The company I work at has a very heavy release process that is getting worse as time progresses. We are encountering more and more issues with each release, which is starting to severely impact our delivery schedules and the quality of each release. We provide a large SAAS product that is deployed to the Internet on a very large web farm. Our deployment process is currently handled by a dedicated team, with minimal developer involvement. We are primarily a .NET shop, however we have a couple Java components as well.

I am researching how we could improve our QA and deployment process to reduce waste and bring more of the process under the wing of our dev teams. I am interested in hearing about how your company deploys your products (preferably SAAS, but not limited to such products) to production, as well as the journey through testing on its way there. I am curious what has worked, and what hasn't, and I'm sure many of you have stories to tell.

EDIT (Additional RFC):

As I have continued my research, I came across the concept of "Continuous Deployment", apparently pioneered by the IMVU 3d online community team. It sounds like an intriguing concept, if perhaps a little complex. I am curious if anyone here on SO has any experience with continuous deployment? Particularly with a large, complex project that has many parts to it. You don't necessarily have to continually deploy to production...for our short-term needs, we would only look at continuous deployment to internal dev/qa/perftest environments. If anyone has implemented continuous deployment, I am also curious to hear how you managed database schema and data changes/rollbacks.

Thanks!

+6  A: 

We deploy a financial services SaaS solution to the Amazon AWS cloud environment. Our solution is 100% Java so many of the tools don't apply to you but the concepts should.

First of all, we reduce the number of surprises when it comes time to do a release by running a continuous integration process. Any time source code is checked in by any developer, the entire solution is automatically built and all unit tests automatically run. Failure notifications are emailed to the developer in question and team lead.

AWS is built around the concept of virtual machines. We leverage this by creating a virtual machine image (Amazon calls them AMI's) that contain the specific version of the OS and applications (Java, DB, etc) that we desire. The build process creates all deployable artifacts and then copies them to a running instance based on that AMI. It's the exact same process for all environments (TEST, DEMO, PROD), except for a single configuration project that encapsulates version differences (e.g. connection strings).

QA tests the result in the TEST environment. Once they approve the version, we repeat the build process targeting PROD (using the exact same version control revisions. Remember, the only difference between each environment is one config project).

At that point we create a virtual machine (instance) running on the base AMI with the PROD code base and call it STAGING. STAGING goes through a series of acceptance tests, both automated and manual. Here's the really nice part... now, we burn this environment (base AMI plus new version of our application) into a new AMI (virtual machine image). Then we create new running instances of our app servers based on this new image, update the load balancer to point to these new instances, and just kill the old instances. That's the beauty of using virtual machines (at least, that's one of the beauty's).

Whenever we need to adjust capacity (peak hours/days), we create new application server instances from the same production AMI and register them with the load balancer. When the peak is over, we just kill the extra instances and remove from the load balancing rotation.

Eric J.
Thank you for the detailed answer. Sounds like the key that helped you was using virtual machines. One of the products we are evaluating is VMware Lab Manager. From what I gathered about AMI, it serves the same basic purpose...template based virtual machines. I guess the main difference is it would be hosted by us on our own hardware. Its good to hear of a scenario where virtualization has been used successfully (and in the cloud, to boot!)
jrista
I asked jamiedp this question too, but how large is your application? Would you classify it as small, medium, large, or huge? This is in regards to size of code base, amount of configuration, and # of users on average. Thanks!
jrista
@jrista: I guess it depends how you define those sizes, but large sounds about right. The application is fairly complex, makes extensive use of enterprise technologies, and supports a fairly high transaction volume.
Eric J.
Size as kind of an amalgamation of code base size, amount of configuration, # of users/average load, etc. A rough estimate of the overall size of the project. I guess it is kind of hard to tack down, which is why I left it somewhat arbitrary (small/med/large/huge). Thanks for the information, btw. We are intrigued by your success with hypervirtualization, as that was one of the key things we were looking into. We've decided to use VMWare Lab Manager as a key component in this new initiative. Now we just have to refine the actual build and deploy process.
jrista
You may want to check out this initiative that grew out of a UCSB research project. It provides a (mostly) Amazon-compatible, free cloud environment. We want to look into it for managing dev/test environments but didn't have time yet. http://www.eucalyptus.com/
Eric J.
A: 

You are not saying what is causing the issues with your releases? We were having issues because the wrong files were ending up in our build. Our answer to this was to build a tool that would give us control and visibility into all of the files in our build. Here is a webcast of the tool we built.

JBrooks
Were not exactly sure what is specifically causing problems. We know part of it has to do with how our build scripts work, and the fact that our build scripts are tightly coupled to the specific folder structure and branching structure of our source control. Another problem we think we have is TOO MUCH process, which requires a **lot** of people to be involved in each release. We know we need to lean the process...but we don't have a baseline "good deployment process" to compare to.
jrista
The webcast at the link you provided seems to be broken. It starts playing for a few seconds, then stops.
jrista
A: 

Disclaimer: I may sell what I've written, but for now, it's free (and not officially released anyway).

We use a system that I've written.

It operates by integrating with your source control and CI server. It allows me to commit the code to SVN, wait for a build on the build server, then, through the apps interface, configure it for a specific build, commit it back into source control, and then, on the servers, it will update. The beautiful thing is that you can update asynch, so all servers will get the new build at around the same time.

It's a bit more complicated than that, and requires a certain way of being set up, but it is really quite beautiful (in my very biased opinion) when it's in action. Shoot me an email if you want a free 'alpha' version to play with. Any beta of it will be free as well.

-- Edit:

General process:

  1. Commit to trunk
  2. Wait for build from CI server, build is 'naked' (not configured for any server)
  3. Deploy build, using the tool ("dashy"), to testing server
  4. Test
  5. Be happy with testing
  6. Deploy build (same build) to live servers

The "Deploy" phases consists of committing to SVN, and then the program, also installed on your web servers, gets the build from SVN.

Effectively, I add a new "base" level item into your standard SVN structure. You normally have:

/trunk
/braches
/tags

With dashy, I add /releases

/trunk
/braches
/tags
/releases
Noon Silk
So, is this a pre-emptive build system, that intercepts checkin and builds before the code is committed (like TeamCity?) I am not sure I fully understand what it does...
jrista
Updated with a bit more information.
Noon Silk
Thanks. Sounds interesting. I am curious if you have ever heard of TeamCity? That is another CI product that I am evaluating. The intriguing thing about TC is that it preemptively intercepts source control checkins, builds and verifies, and only commits the checkin to source control if the build and verification succeeds. Otherwise the offender is notified in a closed loop, and you don't run into the problem of bad code hitting any branch. I am not sure what kinds of deployment scenarios it supports, but deployment can be handled by the build script anyway. http://www.jetbrains.com/teamcity/
jrista
From a cursory view, TeamCity would sit in the same place as CruiseControl does in the general dashy process (i.e. as the build server). In a few days I'll have some links to dashy in my profile, and a far more in-depth outline of how it works (please the alpha version for testing :)
Noon Silk
One difference between TeamCity and other CI tools like CruiseControl, TFS Build, etc. is that TeamCity builds before the code ever reaches the source control system. That prevents unbildable code from actually being checked in and breaking everyone else who syncs from that point onward until the build is fixed. Its greatest strength and greatest selling point, IMO. If you are building a CI server, its a feature I highly recommend looking into, as I think its the way of the future for automated build.
jrista
jrista: Not a bad feature, to be sure, but as an aside, I'm not building a CI server; dashy will happily sit on top of TeamCity or CruiseControl or any other 'build' server. It works *with* them.
Noon Silk
Ah, I understand now. Your basically putting together a continuous deployment service that plugs into other CI products. Sounds interesting. Have you ever heard of "Continuous Deployment" as it is done by the IMVU online community team? Does your product support something similar?
jrista
So it's public now: http://www.mirios.com.au/dashy/ (apologies for the delay :)
Noon Silk
+1  A: 

We have a SaaS solution and basically use the same process (continuous integration server, deployment scripts for test-staging-production) as Eric above, but we deploy it to our own infrastructure using custom scripts based on PSTools (http://technet.microsoft.com/en-us/sysinternals/bb896649.aspx) to do all the copying to the farm nodes.

For each deployment we evaluate if its possible to allow different nodes have different versions of the app (ie. no data integrity risks) or we have to bring the app off line for a few seconds to sync all the nodes (it usually takes about 20 seconds for the app to be back online, since it just coping the apps, from one master node); but the whole key is to have a "one-key" deployment process setup.

Jaime
Out of curiosity, how large is your application? Would you classify it as small, medium, large, or huge? Thanks. :)
jrista
The question in my last comment is in regards to size of code base, amount of configuration, and # of users on average. Thanks.
jrista
The whole suite is about .4M lines of code and is divided in several components (front end, web services, dal, etc.) that we could deploy independently although we usually don't.As far as configuration, all the config changes are managed/changed by our deployment scripts... and are basically changes to connection strings, service references, that are prepared on the main node and twicked on each nodeAs for # of users, we have about 10,000 reg users, and during the day we have about 300 active sessions at any one time, we always try to deploy at the low pick time were we have 10~15 sessions
Jaime
Thanks for thie info. I guess I would classify you guys as medium, based on the numbers you gave me. (We have a code base of tens of millions of lines of code, spanning the very legacy, legacy, and new, with thousands of concurrent users...we fall into the large/huge arena.) I am kind of curious exactly how you use PSTools to deploy. Could you clarify that a bit?
jrista
Sure: the CC server generates the build from the appropriate prod branch, and executes a nant script that packs the apps and copies the zip into a shared network drive that is visible from the prod nodes, executes a deployment Nant script on the main production node using psexec, after that, the main prod node copies and unpacks the app locally to a network share that is local to the prod network, modifies config files, and executes a nant script on each node that copies the app locally, adjusts configs and deploys to IIS (with or without bringing the app offline) hope this helps
Jaime