views:

424

answers:

8

Recently we added a couple web service machines, and they couldn't successfully email out. We (IS) did not notice this, and the exceptions were being swallowed up and logged, but no-one noticed for about a month.

Needless to say, many purchase orders, and retraction of purchase orders, were never sent out for the past month.

While this isn't any one person's fault really, is there any GOOD way to break this to someone non-technical that is higher up in the company than you?

Thanks in advance for any advice, I'm freaking out just a bit. :)


Edit: Reading this over, I'm more asking for tips on how to break the news. I understand there isn't a GOOD way, just maybe successful tips that have worked for you in the past.


Resolution in case anybody was wondering... new web service machines IPs weren't added to our mail servers list of trusted IPs. :)

+3  A: 

Gee, bad news for ya - but it is someones fault.

The folks who built the server and installed the apps and signed off on putting them into production use without testing them. :-)

Pretty much the only way to break this to the management is to acknowledge the MAJOR FUBAR and show them the plan for making sure this kind of situation doesn't happen again.

Good luck. :-)

Ron

Ron Savage
just love that *I* get to break the news, even though I'm not responsible for either that software or the hardware.
Dano
Actually, getting to present it is key - focus on what "your" organization did wrong and how you will fix it - downplay what everyone else did wrong, just list it in the sequence of "what happened". You don't want to apear to be pointing fingers. :-)
Ron Savage
+1 for "it is someone's fault"
William Brendel
+2  A: 

Being honest and direct is the best, rather than trying to cover up certain aspects of what happened.

Don't blame anyone, simply accept that a problem happened, propose a solution, and execute on that solution. Communicate this plan to your superiors and be clear about why you are taking the steps you are taking to solve the problem.

The time to find responsible parties and blame comes after, solving issues having to do with collecting money from customers comes first.

Once the immediate problem is solved, then find a way to ensure that whatever caused this problem cannot happen again. Have a plan.

matt b
+26  A: 

Put emphasis on the fact that the problem was discovered and fixed swiftly by your team. Have detailed metrics on the number of failures, which customers were affected, etc. ready, in-hand. Have a contingency plan ready to describe that will prevent similar issues from happening in the future. Engender a sense of comradery with the higher-up because you are all on the same team and it's a team problem. If you convey a sense of urgency and give the impression that you appreciate the impact to the bottom line as much as he does, he will likely appreciate that.

Lowly techs often make the mistake of going to upper management with their tail tucked between their legs, like a child who shamefully shows his parents the lamp he broke and waits for a spanking. You are an adult and a professional - leap into action and coordinate the right people to be in place to make the right decisions to fix it. In a case like this, that inevitably means bringing in upper management, but do so with an intention of solution seeking, not fear.

Rex M
+1 for the details and metrics to show that you do care
routeNpingme
It took them A MONTH to NOTICE!!! This is in no way "fixed swiftly". No amount of sugar can coat this fiasco.
Huntrods
@huntrods not about sugar-coating, it's about giving proper recognition to the positive aspects of the situation. It doesn't matter if it wasn't noticed for a day or a year, if it was fixed within the hour once someone did realize it. That's worth kudos.
Rex M
nice answer Rex .
Al pacino
Just have a plan in place to notice that kind of thing faster in the future...
Knobloch
Taking the steps to assure the customer that it will not happen again is critical. Don't just tell them what you will do, show them.
Chris Ballance
+1 for "with their tail tucked between their legs, like a child who shamefully shows his parents the lamp he broke and waits for a spanking." :)
Sophia
+3  A: 

Raise the issue as soon as possible.

Come with a clear plan/lists of steps of how to mitigate the problem:

  • how to fix the issue, so further processing works fines
  • is it possible to determine which transactions are affected
  • what is necessary to ensure this does not happen again - automated tests for deployment, preproduction stage for new servers, anything else?

Be proactive in resolving the situation. As long as it's not a direct fault of yours, you might even benefit from the whole snafu.

Franci Penov
Along those lines, why is the OP discussing it here instead of being in his manager's office the minute he figured out the problem with a solution in hand?
Chris Lively
Yeah, I know. With such a serious problem, he shouldn't be wasting time here.
Franci Penov
it was fixed before i left work. just trying to figure out how to tell other departments the next morning.
Dano
+2  A: 

Point out

  • What happened
  • Why it happened
  • What you think the fallout was (ie, missed purchase order retractions)
  • What you've already done to fix it
  • What you need to do to (if there's more fixing needed)
  • What management needs to do, say, spend (if needed)
  • What can be done to prevent similar incidences in the future

Be proactive about reporting it and spin the negative into a positive ("we've learned the following valuable lessons").

Avoid pointing the finger wherever possible unless asked, and try to spin that in a positive light too. Techs make mistakes; they are human after all. If they can learn from the mistakes made they're probably worth keeping around.

Adam Hawes
+5  A: 

You bring shame to your department. You know what you must do.

http://en.wikipedia.org/wiki/Seppuku

Ted Dziuba
A: 

Whatever you do, make sure you have agreed on it beforehand with your immediate superior, at least. Even if you are iS director.

le dorfier
A: 

lie or cover it up :-), if you can shed the blame to a new intern ill award you 10 kittens!

Karl