What kind of safeguards do you use to avoid accidentally making unintended changes to your production environment?

views:

266

answers:

+7 Q:

What kind of safeguards do you use to avoid accidentally making unintended changes to your production environment?

Because we don't have a good staging environment we often have to debug issues on our production systems. We have web, application, and database servers.

What kind of safeguards do you use to avoid accidentally making unintended changes to your production environment when doing this?

EDIT:

The application is a very complex B2B vertical web application. There is a lot of data involved. Some tables have close to 100 million records.

EDIT:

The staging environment we have in place does not have the capacity to mirror production. There are also hundreds of gigabytes of data files involved besides the actual database data.

EDIT:

We do use source control for the code but not for the stored procedures. There are some old stored procedures in source control but nobody keeps that updated anymore.

The main concerns are the database and data on the file system.

BTW, I am a consultant at this company, not an actual employee.

+5 A:

The most direct answer is: "Don't do that."

Geoffrey Chetwood 2009-06-05 15:18:52

..Scotty Don't!

TheTXI 2009-06-05 15:19:55

Yes, I agree, but the company won't invest the resources for a good staging environment.

Dana Holt 2009-06-05 15:20:19

@TwistedAphid: Then it is your job to make sure they do. They will spend the money one way or the other. Might as well be before the disaster.

Geoffrey Chetwood 2009-06-05 15:22:21

@TwistedAphid - somehow you need to make damn sure they understand the risks, preferably in writing.

Otávio Décio 2009-06-05 15:22:37

@Rich B, ocdecio - We have been trying to tell them the risk, and recently we even had some data loss because of this problem. They paid like $20K to a data recovery company to recover a SQL server database file. We just can't seem to make them see the value of prevention.

Dana Holt 2009-06-05 15:26:39

You work for a very stupid company. I suggest finding another one that won't blow their foot off at every opportunity.

TheTXI 2009-06-05 15:27:55

@TwistedAphid: Time to seek new employment.

Geoffrey Chetwood 2009-06-05 15:28:26

If you are a consultant, it might be time to shut your mouth and collect your checks every time it burns down.

Geoffrey Chetwood 2009-06-05 15:45:48

@Rich B - Maybe you are right. I just hate to see this kind of thing happen. It has really affected to morale of the developers here. In the end they bear a lot of the pain of having to scramble to repair damage when it happens. At least I get paid by the hour. :)

Dana Holt 2009-06-05 16:11:25

$20,000 is a decent amount (2-3TB) of Dell SAN storage. That's a pretty hefty price to pay for a one-off kind of failure recovery.

Chris Kaminski 2009-06-05 18:14:09

@Darth: I am sorry, I am not following. What is your point?

Geoffrey Chetwood 2009-06-05 19:18:56

+5 A:

source control. nothing like a rollback when things to irreparably wrong. Also, a diff can help you replicate the changes to other production systems.

Jimmy 2009-06-05 15:19:11

If they don't have that already then TwistedAphid needs to find a new job ASAP.

David Thornley 2009-06-05 15:34:03

well, it was more of a reminder that you can keep your production files versioned, than a "use source control" generally

Jimmy 2009-06-05 15:57:50

+1 for keeping production data files in source control. An often overlooked best practice.

DarkSquid 2009-06-05 16:04:50

Doesn't help you if your changes were wrong and borked umpteen million records. And I'm *NOT* source controlling a 200GB SQL export. :-)

Chris Kaminski 2009-06-05 18:23:05

+2 A:

only allow certain accounts write access, so you have to log in differently to make a change

on web server, have two directory structures, that mirror each other, one where only one ID can write, the other staging dir, everyone can write.

on database server, have one production db, where only one ID can write, have a staging db where everyone can write. the staging DB can have nightly backup restored to it.

HOWEVER, if you have a bad query or some resource hog in your staging system resources will be pulled from production, and the machine could hang.

KM 2009-06-05 15:19:28

+2 A:

Read-Only/Guest accounts. Seriously. It's the same reason you don't always login as root or Administrator.

Alex Beardsley 2009-06-05 15:20:22

+1: can't screw up production if you can't change anything...

RSolberg 2009-06-05 15:25:32

I bet I could run some SQL with the read-only account to bring your production environment to it's knees. ;)

tom 2009-06-05 15:36:51

This will not safeguard anything.

Geoffrey Chetwood 2009-06-05 15:55:36

I allways log in as root becuase I nevar maek misteaks.

Pesto 2009-06-05 15:59:02

@Rich B Care to elaborate?

Alex Beardsley 2009-06-05 18:14:39

@Nalandial: I'll do it for him: if they've got the permissions necessary to debug a production environment, then they have the permissions necessary to inadvertently destroy that production environment.

Pesto 2009-06-05 20:30:56

+2 A:

New production releases go via our systems guys, the programmers and developers can only request to make their new system go live, approval is needed as well, and we show that each change that has been made has been tested (by including a snapshot of all that was tested in this release in the production request).

We keep the previous production releases for fallback in case of issues.

If things do break (which they shouldn't do often with a proper testing procedure and managed releases) then we can either roll back, or hotfix. Often when things are broken in live and the fix is small, we can hotfix, then move the fix to test to do a proper test.

Regardless, sometimes things get by...

JeeBee 2009-06-05 15:21:56

In case you really have no other choice, and it is likely to be a chronic situation... consider adding some way to the application data (files, or database) to flag a set of data as 'please god do not actually actively change production state with this data', combined with data dumps at critical positions in a process when this flag is activated, you may be able to exercise most of the production logic without the data actually being acted upon.

jerryjvl 2009-06-05 15:23:29

+1 A:

This is a tough thing, and it goes with the territory of "no staging environment."

For many reasons, it's best to have a dedicated (duplicate) of PROD you can use to stage deploys to...and to debug on, but I know that sometimes when you're starting out that doesn't work out as quickly or thoroughly as we'd want.

One thing I've seen work is the use of VMs: aside from the debug environment, you can create a mini-PROD in a VM and use that to debug. This may not be practical given the type of app you're developing, so additional detail in that area would be helpful.

As for avoiding changes to PROD during debugging: is there a reason you'd need to change anything to facilitate debugging? If so, that might be worth looking into solving another way.

DarkSquid 2009-06-05 15:25:09

Sometime to recreate issues we have to put in test orders (very complex system) and then muck around with the database to debug.

Dana Holt 2009-06-05 15:29:07

That's a perfect time to use a VM (or two: one for your app and one for the DB). If need be you can capture the session/input from PROD and replay against your VM.

DarkSquid 2009-06-05 15:47:16

+2 A:

For Web and Application Servers, I would try to copy the environment to a new location (but on the same environment) and have the affected people reproduce behavior on the copy. This will at least give you a level of separation from accidentally screwing with 100% of your clients.

For Database Servers, I would configure user accounts on the production system to give them read only rights.

Joseph 2009-06-05 15:26:42

+1 A:

Version control is immensely helpful for controlling changes to production environments - just make your production environment a working copy of the appropriate directory or directories from the repository. When you roll out an update, your source control system makes sure that ALL the changed files get copied. When an update breaks something, you can roll the production working copy back to the last revision which wasn't broken. Also, you can check your production WC out from a tag instead of from the trunk; that way you can decide which repository revisions to apply to the production environment by adjusting the tag.

If you're not familiar with the concepts of version control systems, I'd advise you to do some research. They're conceptually complex but incredibly useful and powerful. The Wikipedia article is a good place to start: http://en.wikipedia.org/wiki/Revision_control

The Digital Gabeg 2009-06-05 15:34:32

Great idea. Except, here's the scenerio: hey, we noticed that this change in SP_doSomething caused all these 20,000 transactions to be wrong in random fashion, but we have 20,000,000 transactions in the system, so we can't rollback. THAT'S what version control can't protect you from.

Chris Kaminski 2009-06-05 18:20:40

+1 A:

I'm sorry, you have to have a staging environment. There's no getting around this. If it means you have to cull the size of your datasets, then that's what you have to do. Use VMware and VMware converter to import the production systems during down-periods, if you have them (this is a many-hour process, so maybe not practical).

There are a certain class of problems you can't solve without having full access to production DBs (or a copy), performance is one of these. But you really should build a staging environment, even if it's on someone's desktop machine with a stripped down dataset.

That aside, I've had to live my life with a few of these in the past, and really, there's nothing you can do except lots of backups. Every change you make should be preceded by incremental backups. That way if you fubar'd something, the amount you've lost is not substantial. SQL server can take differential backups that limit amount of diskspace used for backups. Oracle can as well.

Chris Kaminski 2009-06-05 18:19:03

ansaurus

tags:

views:

answers:

What kind of safeguards do you use to avoid accidentally making unintended changes to your production environment?

EDIT:

EDIT:

EDIT:

related questions