What is your worst WTF moment, and what can we learn from that mistake?
Typing in 'rm -rf *' in the wrong command window, you know, the one logged in as root that happened to be cd'd into /bin. Thankfully rm deleted itself before any catastrophic damage was done.
When trying to root out record locks on the live system of one of the biggest warehouses in the UK at 1:30am, don't accidentally kill the procman.
Joined an open source project. Got the latest code. Made a change. Checked it in. Broke the build. Spent an hour trying to figure out how to undo my changes and get the build working again.
Its one thing screwing up when nobody else can see you. Its another to do it where everybody has a notification tray app that pops up a nice red X when you screw up.
Lessons learned: Know you tools before using them. Follow this pattern when working on a project with multiple contributors: Get latest, make changes, test, get latest, run all tests, check in.
Similar to MikeReedell, I did an rm -rf
, but instead of typing in the pathname, I was just wildly bashing tab, relying on bash completion. Which was missing. And soon, so was /usr/
.
Hit my credit card with 182 x £150 transactions. I then send refund requests, called them to ask what was going to happen and they didn't couldn't give me a better answer than "Wait and see"
I was using a third-party COM object for sending email from a Classic ASP page.
It was a pretty simple process.
- Loop over a list of users from a database.
- Send each user an email.
The problem was that the COM object didn't reset itself after each call to the sendEmail() method. I didn't know it but, you had to explicitly clear it.
That meant that the first email went to Alice.
The second email went to Alice & Bob.
The third email went to Alice, Bob, & Charlie.
I was, luckily, using the BCC field so no email addresses were exposed but I still ended up spamming about 100 people before I got IIS shut down.
Errant PHP script that generated 5,000 emails before Apache took a poop, and until our ISP blacklisted us. It's been about nine years, and it taught me to enable sendmail on my local box to test mail generating scripts from then on. Or else, you know. Fail.
Trying to cleanup old emacs auto-save files (end in ~) by typing
prompt>rm *~<enter>
Instead, I hit
prompt>rm *<enter>
prompt>~
I deleted a symlink to a directory containing very important data.
Or at least I thought it was a symlink...
Doh.
Connected through remote desktop to a production windows box to make a change that's in a another city, when I was done making my change I did a 'shutdown' instead of 'logoff'!!! Since it was at a remote location and not a local box I had to own up to my stupid mistake and call to get the machine turned back on by someone at that site. I now overly pay attention to 'logoff' and 'shutown'.
During testing of my prototype, I needed more data than what I had available. I connected the prototype to the production data feed, watched it for a bit to make sure nothing was going wrong, then let it accumulate data for a while.
I checked on it two hours or so later and found that my program had quit. Not a big deal, but I couldn't connect to the production feed again. Oops, it's down! During my program's long processing, the data backed up on the production server side and eventually crashed the feed. They had to manually reload the data from hard-to-access tapes.
My main saving grace was that I was the one who reported it first. The person who usually monitored the feed was out, and his replacement hadn't noticed it was down for over an hour. I got a little talking to, but nothing too serious.
We did however add a buffer process so that if my system bogged down again, it wouldn't take everyone else down with us.
I accidently deleted a database of over 100,000 customers....
I learned at that exact moment, one and only one lesson. BACKUP BACKUP BACKUP.
While removing old entries from our development server crontab, I accidentally copied the pared-down dev crontab onto our production server. Our flagship app uses about a dozen cron jobs to poll a database every 5 minutes for job processing. Within an hour the cron error emails going out to our dev team shut down the corporate email server.
Oops!
I rewrote a whole module that was working perfectly but that looked "messy" to me. I had managed to convince my boss that it was the Right Thing To Do, and the rewrite took me 3 weeks.
I still remember the pearls of sweat running from my armpits as my boss, looking over my shoulder, was commenting on bug after bug in my new shiny super-clean module...
I'm no longer "rewriting from scratch" without a really good reason.
Moved instead of copied a svn repository for use in a new project. The old project was in maintenance mode and so I didn't realize my error until several weeks later (and after we'd long since overwritten the backups). Luckily, I was able to export the first revision of the new project, but we'd lost all the revision history from the old project. :(
When i just started out I didn't understand how to get an identity back off a seeded primary key in the database. Instead I did a full select on the table and looped through until I found an unused key.
Ewwwwwwwww. Thankfully that never made it into production.
I shipped my administrator password for an FTP site inside an open source project I was working on. ;)
I had just imported a bunch of old data into a new system, had taken about 5 hours and was due to go live two hours later, this was around 4 in the morning.
For some reason I tried to delete something:
DELETE from important_table; where id=4
Yeah, I didn't notice the semicolon either. And no, there was no safety net.
This is always one of my questions when I interview somebody... it can be very informative about a candidates honesty, problem management, and ability to learn from mistakes.
So some beauties:
Implementing password aging as a risk "0" (i.e. did not show on change reports) on the financial systems of a fortune 100 company. Who would have guessed that the many many oracle and application accounts wouldn't be able to specify a new password when their old ones immediatley expired. 6 hours downtime.
Never ever use the console of one server to connect through to another. A newbie, first day out of college and on the job, was asked to reboot a test server. No biggie. Connects to console, and reboots. Little did he know someone else had used the console to connect to the production server. Brought down a factory, but at least this company wasn't in the top 100 of the Fortune list :-)
So your manufacturing lines are run by some old HPUX machines. You have some vague knowledge of Unix, and you are annoyed that people keep changing settings on the server. I know, make the files read only, from root, with no backups:
chmod -R -w *
... ehh, no.
A Unix root account can be a bit like a loaded gun sometimes...
JB
Good old rm -r *
While at school I wanted to delete my Graphics folder (and any folders it contained) and so from my user folder (bad idea) I typed:
rm -r * Graphics
After an abnormally long time I was informed "Graphics: no such file or folder"
When running an UPDATE/FROM statement on a large million+ record table, don't forget to comment out the SELECT line you just used to test the FROM clause (i.e. I updated the entire table rather than the 10 records I was looking for...)
I deleted a swap file to free up space for Doom (or some other Id title) on my dad's 386. He was not pleased.
When I architected my "second system" I hit the "syndrome" real bad. To "simplify" things I gave all the business objects .Save() and .Update() methods..... then I passed those objects into the UI.
Lets just say that WinForm events that trigger one another plus embedded round trips to the database made for an architecture that is bettered by a bowl of soup.
I almost hit a zero bug count for a complete upgrade for a particular client. One of the last things I did was ensure that all exceptions were caught and recorded. Only issue was that I added exception handling into the exception logging code. This had a bug in it and that was that if there was no existing log file it would throw an exception which it would then try to log. Stackoverflow!
I blue-screened the client's laptop. :(
Still after that was fixed it was zero-bug (or rather 1-bug) so my blushes were somewhat spared/limited.
edited command.com and changed ". . ." to "..." since I though it was a grammar mistake. DOS 5 did not like that and would not boot
This isn't my bug but it still made me lol. The Dev manager was responsible for developing one component for a really important release. The week before the release the Dev Manager went on holiday and people tried to use his component. It worked for about 10 minutes and then fell over, the office was in panic. The very best devs on the team were assigned to find out the problem.
Eventually one of my colleagues burst into laughter and I swiveled my chair to see the following C# code (or thereabouts) on his screen.
public string GetNewGuid()
{
SqlConnection connection = new SqlConnection(Resources.ConnectionString);
connection.Open();
SqlCommand command = connection.CreateCommand();
command.CommandText = "select new_id()";
SqlDataReader reader = command.ExecuteReader();
reader.Read();
string guid = reader.GetString(0);
connection.Close();
return guid;
}
The problem (aside from the REAL WTF) was due to the fact that he didn't dispose the DataReader. After about 10 minutes of the app executing a ridiculous number of round trips to the database the database refused to give out any new readers (or the app ran out of memory, I forget) and the whole thing fell over.
This method was replaced by:
public string GetNewGuid()
{
return Guid.NewGuid().ToString();
}
Told wife the dress made her look fat. :)
No, really... I changed some value in sql 6.5 that told it how much memory to use. But the box was for physical memory, and I put in an amount more than the machine actually had installed. "You must restart the service for this change to take effect" which I did. At 1am, on a production server, and I didn't check the backups, which had been failing btw... anyway, the service wouldn't restart because it couldn't allocate the memory, and I couldn't change the setting because I couldn't connect to the server object because it wasn't running because it couldn't start because the value was wrong and I couldn't change it because, well, you can see the loop here...
$249 to mss and an hour on the phone with a Guru and we found the registry setting. I finally got to bed at about 4am...
Programming a data synchronization system trough FTP, I didn't think what would happend if the FTP wasn't able to CD into a directory. Well he wasn't able to do it so it stayed in the root.. and after finding a lot of files that didn't match the synchronized system. Well the script started to delete everything in the server.....everything.
I realized half an hour later when the script was around the WINDOWS\System32 folder....
When I set up a new VMware instance at the evening and went home keeping it running. What I didn't noticed was that I used the IP of the Nameserver as IP of the VMware instance. Suddenly all hosts in our building tried to connect to the VMware for DNS lookups.
Our whole network was practically down.
Since this was a VMware our Admins were not able to track down the MAC address. So they had to plug off every single computer in our office (~500) until the problem was gone.
At the next day I found a letter on my desk: "Who dares to switch this computer on will die".
I once forgot that the id column in a table was not auto_increment, and was in fact a foreign key to a single row in another table. I made the mistake of saying, "Oh, I'll just manually update this row in the live database" and ended up 'fixing' about 250 rows. Luckily it only caused about 10 minutes of downtime for the users and no one flipped, but I think I'll just ask the DBAs to do it next time.
Did an UPDATE without a WHERE clause.
Lesson learned: begin all UPDATE queries with "BEGIN TRAN"
I was working on an iPhone version of a site. I had just woken up and didn't check what directory I ftp'ed to. Before I knew it, everyone visiting the site was seeing the iphone interface (which had administration links and the works, also not pretty). I didn't have a backup of the index.php file and panicked. Luckily, the site uses Wordpress as the frontend and was easy to fix. Now I double check my paths and never use index.php when testing things.
I started a project without a specification. When I asked if I could work out what was wanted I was told not to speak to the client.
It's a meal ticket for life.
Back in 1995
I had multiple windows open... what's this stuff doing in D:\winnt? Ah, just old crap... I'll delete it... except it wasn't the local machine... it was our main server at our Internet Service, 25 miles away.
After that ALL of my DOS prompts included the name of the server.
I once wrote a symbolic assembler in itself. I actually punched the source card deck and brought it into the machine room before realizing that I had no way to translate it the first time. Not a great public embarrassment but I did feel awfully stupid standing there trying to figure out which binary executable to load.
-Al.
A bug in the style of Office Space
A while ago I had the first live purchase on a software shopping cart module that I coded in ASP.NET/C# for a popular CMS. We had tested and retested all aspects of the cart (except the live purchase, which we avoided because of the credit card bills).
So the purchase goes through for $32.95 in all database records. For all intents and purposes, everything looks correct. The problem being that the Authorize.Net receipt shows another story. In reality, the customer paid $2.00.
OMG Hacks! was my first thought. I retrace all the code to ensure nothing can influence the price besides the items in the cart. The item price is correct. The quantities are correct. The totals and subtotals are correct.
Finally I trace the total to the last stage of the purchase process, wherein I format the System.Decimal type to a string for insertion into the authorization transaction via HTTPS.
I see:
x_amount.ToString("D2")
And now I see the source of the "$2.00". I rack my brain, trying to remember why on earth I would think this would work. I run a test and the string returned is "D2". I cry and wail. Evidently Authorize.Net thinks "D2" is close enough to the requisite "2.00" to charge $2.00. Finally I remember that I saw this as a forum post suggestion. (Where was Stackoverflow.com then?)
When originally coding, I had planned to use the "C" (currency) formatting. This does everything correctly except for pre-pending a "$" to the string. The Authorize.NET API docs say they want it in decimal format without "$" or any other monetary symbol. So I went to the collective wisdom of other .NET developers for a quick workaround. I didn't like the idea of formatting as currency and then stripping the first character, so I saw an off-hand post about "D2" formatting causing essentially the same format but without the monetary symbol. I believed it and did not verify its output. Gah!
Not to mention that this had been extensively tested in test mode. But for some reason we thought nothing of having Authorize.Net return transactions in test mode that had a purchase price of $2.00. Myself (and unnamed others) thought it was just a quirk of the test mode...
The morals of the story are:
- It's not a valid approach until it works for the end user.
- Make sure a "found solution" does what it says on the tin.
- $30.95 is a bit spendy for a bug report, but it got the job done.
- If you are going to "mess up the mundane details", make sure it isn't going to cost your customers all but $2.00 of each sale.
Mopping the computer room floor, and hitting the (uncovered) emergency shutdown button with the handle of the mop. I did not get points for mopping the floor.
The lesson here? If your emergency button is not protected, it's going to get tested.
Ooh! Embarrassing confessional time. The first one that comes to mind was shortly after I switched to OS X from years on Windows and had basically forgotten anything I knew about unix.
I was working on a personal project and decided I was at a point where I should backup my stuff. So I opened up the command line:
gzip *.py
Oh man! It zipped every file individually! Right, I have to tar them first. Okay
rm *.gz
Wait! Why is my directory empty?! Oh no....
Yeah, I also forgot that gz doesn't copy and zip, it zips in-place.
I got lucky, though, and still had most of the files open in my editor.
This was what convinced me to finally install a version control system on my home machine and use it for my own stuff.
This wasn't my biggest WTF moment but it was the biggest I have heard of to date.
A decent sized grocery store chain that is based here in Grand Rapids had a developer that wrote a system for promotions. The system would take in SKU's and you could set rules like buy SKU "XYZ" and get SKU "ASDF" half off. This system was in production for a while and one of the "features" was that you could leave the SKU blank and it would apply to all SKU's.
Someone on the business end didn't realize this and they accidentally set up buy one get one half off rule but without any SKU's.
So basically anyone who checked out at any of the stores could buy a pack of gum and get the rest of their cart half off!!!!
It made the news but they never really mention what really happened behind the scenes....
I added a new feature to update status on table with over a million records. The SQL update query was "UPDATE table ... WHERE id = id" instead of "WHERE id = :id". Thankfully this was infrequently used and the database server only crashed a half-dozen times before the problem was found and fixed.
This one is more of a dumbass IT moment that was catalyzed by development, but here it is anyway.
I was putting together a simple console app to interact with a SAM filesystem on one of our Solaris servers and help with file management/restoration. I needed to update some of the library files in /lib, so my first instinct was to backup the files in another directory in case I needed to go back after overwriting them. Made a copy of the old files, put in the new ones, didn't fix the problem I was looking to fix. So I go to restore from my copy, start by deleting the current installed libraries without thinking, then tried to 'cp' the backed up files...problem was that 'cp' is DYNAMICALLY linked against those same files...so it threw up.
I had deleted the dependencies for pretty much every fundamental utility on the Unix box...on a prodution server with a few hundred connected users, all hourly 'data-manufacturing' employees who need the server to do their jobs...
Panic.
Luckily we had another server running the same version of Solaris, so I hopped over to that one, wrote a quick and dirty 'cp_oops' C app in about 1.5 minutes, compiled it to STATICALLY link, pushed it onto an existing share to the broken server from the non-broken one, ran back and copied the libraries back before anything threw up in a noticeable way (to the production staff at least :p)
Some badly coded linux applications require insane permissions to be set on every directory they use. One in particular, a perl-based music streaming server that shall remain unnamed, required execute permissions on all files in my shared music directory.
As I frequently add new music, typing the command 'chmod -R 777 ./' in the data directory became routine. Worse yet, as several user accounts used this directory, the command was executed as root.
Being a very l33t fast-typing keyboardist, and as most of my peers have probably experienced, there is a quantum effect where keys change places for a split seconds which cause the characters to be input out of order. So, one night fate would have it that the order of characters be 'chmod -R 777 /.'. For the uninvited, this means that everyone will be granted full access to every single file in the entire filesystem!
Fortunately, I quickly discovered my error and managed to abort the operation after a few seconds. It still took me a couple of days to clean up the mess though.
I don't do it this way anymore. And I'm glad my job title does not include "UNIX admin".
Wrote a plug-in for MS's IIS web server that returned an XML-formatted dump of one of our application's databases, without taking into account the amount of data involved. Turns out IIS wasn't happy trying to return 10-15 MB in a single request, and would routinely drop the connection. Worked much better when we fixed the plug-in to send data in 64K chunks; even better when we came up with querying semantics more sophisticated than 'gimme all yer data'. :-)
My worst was made years ago when importing a bunch of stored procedures to a live server.
I didn't notice I had said to drop the existing data on import. (Why is that a default setting?)
The backups failed because they were spread over two rather large files we couldn't get to download from the backup location.
The site, a statewide system, was down for 3 days. And I made the mistake on Easter Sunday.
rm -rf is not something you want to get in the habit of using often. Leave off the -f unless you have a good reason to include it. Especially after you start a new job, it stinks to have to inquire--during your first week--about what automatic backups are in place.
Poured a glass of red wine into a laptop. The leaden slosh of horror I experienced has traumatized me to this day.
If you ever have the manic urge to totally deadify a laptop, I can heartily recommend it.
Our product was used by police forces to input data about people that are arrested and what they are charged with. It would also store digital mugshots and fingerprints, and electronically submit the fingerprints to the FBI. While testing, we would routinely use our own fingerprints for fake bookings that got inserted into the test database. Except for the time that I "temporarily" switched the test machines over to the production database and forgot to switch them back...
Cleaning up our production database was easy, but it took a court order signed by the superintendent of the Boston Police Department to remove my colleague's fingerprints from the FBI database -- she had booked herself under the name "Elroy Jetson".
When saving reports to disk, I took the date and time and computed the MD5 hash and used this as the filename. I think I thought the MD5 would make the name more unique! They were displayed in alphabetical order, which meant every time a new report was created it popped into a seemingly random place in the list.
I have absolutely no idea what was going through my mind that day. Once my colleagues spotted it, it was swiftly removed and the standard response to any question became "MD5 it". Doh!
rm -rf .* deletes more than you would expect.... I'm just sayin'
Removing safety checks in a sales application for a tradeshow and deploying it immediately is a bad idea :)
A few years ago, I was responsible for slapping together an ad-hoc sales program for a company that was selling its products at NAB. The system was to be run on laptops connected to USB card-reader out front, all connected to a "server" in the back part of the display. We ran a bunch of tests to make sure the card readers worked, and that we could properly charge credit cards and everything looked great.
The first morning we started off with pretty brisk sales, and it looked like the system was performing as expected. Then at about 11am, a guy gets shown into the back room and says that he went to use his credit card at another booth and it was declined; after calling his company they said he had reached his limit and the only other purchases today were listed as being from us.
This is what had happened: the salespeople were complaining that they had to press {ENTER} everytime after they swiped the card to verify the amount and send the credit charge through. Figuring that everything was ok, I circumvented the dialog and had the app just send the charge directly. What I didn't realise was that the USB card readers could actually send the "swipe" message several times in a row and now the program was merrily charging people far more than they expected.
I spent the remainder of that morning crawling through the hundreds of credit card transactions voiding all the duplicate/triplicate/.... charges we had made. Never, never again :)
Didn't happen to me personally, but a little while ago someone in my company renamed a cronjob file to cornjob, which caused mass confusion. For several days. Almost had a client drop us because of it.
I was logged into my new dedicated server box and configuring some firewall rules over ssh. The first thing I did was set it to not accept any connections from anyone. Then I saved it to test that before going through and adding the various ports I wanted to allow.
Needless to say, the first rule worked...
deploying another developer's code straight into production without proper code review or testing:
this guy wrote a simple java servlet to handle 404s and general 500 errors. it was supposed to just kick the user off into a simple "this page cannot be found" or somesuch error page.
the problem was, his servlet made a database connection ( which may or may not have even been used; i dont remember ).
so the first time there's a database error ( probably something to do with a temporary lack of database connections in the pool ) - the error servlet gets called up, tries to access the database, which throws an error to the error servlet, which tries to access the database...and so on into infinity.
over the next 24 hours, the our site gets close to 5 million hits, and all of our servers grind to a halt.
lessons learned:
- don't push unproven or worse yet, unseen code straight to production because a suit says so.
- do 404 pages the right (easy) way.
On a DG-UX system, pressing TAB while typing the first letters of a command (probably 'shar') as root - of course - and getting 'shutdown' unexpectedly. But it was fine, because I had typed '--help' as the parameter. Except that its shutdown didn't seem to care about that parameter, so it shut down.
Tools down around the office, me watching it boot back up (it took about half an hour) with management laughing on the other side of the server room window.
I learned quickly about sudo - and about checking man pages on any flavour of UNIX I wasn't intimately familiar with.
I was working on a database project for a client and had written all my DDL for creating tables and such in scripts for deployment from environment to environment. The first thing the script did was drop the tables before recreating them. I did NOT however code a prompt for what database to hit, it just worked in your current session. Well as you have probably already guessed, I was in the wrong environment when I executed it! And this was after they had hired temps the week before to enter production data. Luckily we had just created a backup that morning. This was quite early in my career and I learned a valuable lesson from it!
Big press gathering.
Have coded for four months straight.
About to demo interactive game for kids on HIFI you wouldn't believe, 3 4x3 meter screens, as well as 30 client computers in an auditorium.
Starts show.
50 seconds into the (beautifully synced across the 3 screens) intro, power goes down.
5 sweaty minutes later, power back up. Ok.
Another 2 minutes later, past the intro, players begin hitting the database with answers to questions.
For the real WTF, everything slows down to a crawl, audio/video out of sync, client screens dead, all waiting for.... The g¤#! d"!#¤d Access database the client insisted on!!!
I was prepping some data for a manual email marketing blast to some 2,800 users. I forgot to edit out the loop to send to all customers during my initial test run. So, when the loop over 2800 customer ran, it sent to my test address (my email address).
To make matters worse, my browser crashed due to the POST, and when I brought it back up, re-triggered the action, sending a total of 4600 emails to my inbox.
I'm the admin of my email server, so thankfully I was able to do some cleanup on the box, but not before it took me over-quota and nearly killed Outlook.
Started up a web proxy with a single allow requests to port 80 rule, oblivious to the fact that I had given no deny rules whatsoever.
Jump forward 24 hours to phone calls from client site asking why the internet speed had slowed to a halt, queue searching and discovery of russian/chinese IP addresses running everything through my poor client site proxy.
Always check your web.config or app.config files before uploading them or checking them into source control. You don't want to leave your passwords and localized settings in there. I've done this more than once.
First day on the job I was asked to set up a new machine for development. So I log onto the machine, sudo to root and start by covering all the basics... like the firewall. Ofcourse I previously had only configured such basics from the console, so it took me a few moments to figure out why I lost my connection right after the
ipchains -F
ipchains -A input -j DENY
Lucky for me the machine was one floor down, not half a country away.
A recent one for me (just last Friday): I wrote a "clean" script for a Mac project that I was working on that went something like:
rm -R ../..
It wasn't working when I ran it from the directory that contained the script, because I had the directory open in Filer, so I cd'd to my home directory and typed
/home/ken> sudo /project/foo/utils/scripts/clean
As soon as I hit enter, I knew that I had just done something very bad. It deleted all of the files owned by root on the whole hard drive.
Moral of the story: When you write a "clean" script, make sure that it is a specific as possible.
Hard coded my Google username and password into a python project (used the GData API), then forgot I had done so and committed the code to SVN.
And of course SVN commits cannot be undone.
root]# mv / .
I typed this by mistake while logged in as root. Every file on the server was moved to the /root directory. Had to wipe the drives and reformat. Wish I had known about the Coroner's Toolkit.
I once left in some testing code that caused a thread to sleep for 10 seconds during an important loop. Ground the system to a halt, and it took a lot longer to find than it should have. It's become the canonical example of screw-ups at my company.
My first job was working at an ISP back in the 90s. We all brought our computers in from home to play Doom during my first month of work. When it came time to leave, I disconnected my computer from the network... and apparently I took the network terminator with me (not realizing what a network terminator was at the time). Doh!
I got a new version of an operating system and was all giddy to install it. I went through my files and thought I had backed everything up. Turned out I backed up everything BUT my programming projects. Lost LOTS of work to say the least.
Once, our credit card processing system was down, so we just collected charges to be put through once it came up. When it was available, we wrote a script to collect the charges and input them into the credit card processing app. To make the coding and rounding easier, we treated the numbers in the script as cents (i.e. we multiplied by 100). Of course, we forgot to divide by 100 before submitting the charges. Even worse, we realized the problem after only a few charges went through, but the application would not let us remove the items or even void the transactions until it cleared its queue, so we had to wait for it to finish mischarging about a hundred people before we could void any of the transactions.
Lesson #1: Make sure that your data is presentable at all times.
Lesson #2: Make sure that your data doesn't ever look misleadingly correct.
Lesson #3: None of the credit card companies called us to ask why we were submitting so many transactions beyond people's credit limits.
Lesson #4: This is a good way to get a list of your customers with large credit limits.
On a day of particularly bad judgement... tired I guess...
I was in the lab making some last minute fixes prior to a customer trial, if I recall correctly. It was crunch time anyway, and I was behind the eightball. It was a windows dev machine with mingw installed.
I did this: grep texttofind *.txt | output.txt
Oh, power of recursion!
It filled the C drive in seconds. "Out of system resources" message boxes were poping up all over the place. I was frantically trying to clear them so I could kill the grep command when a ill conceived Windows message came up:
"Drive C is full do you wish to format? Ok / Cancel"
Before I realized what the message read, I clicked "Yes".
That was all she wrote, I was staring face to face with "non-system disk error".
Simultaneously both stunned and pissed, I tossed the dead machine aside, grabbed my personal machine from my cubical, pulled a copy of the trunk from subversion and I was back in action in a couple of hours. Thankfully, I did a commit earlier that morning.
All hail source control, couldn't live with out it!!!
I once drove two hours to our datacenter to install a new production box. After getting back and spending the next day configuring the server and installing our software, the last thing I needed to do was change the password.
Somehow I managed to enter the new password twice with the same typo. After trying a couple of dozen permutations to no avail, I finally gave up and had to explain to the team why I would be spending the next day back at the datacenter.
When issuing a DELETE command to remove a record from a table in SQL, never forget to add the WHERE clause...
My experience was last year, fall semester. It wasn't so much as a WTF on my part, rather a WTF on the instructor's part. She made us comment every line of code (including whitespace) the entire semester. She'd dock points too if you didn't comment properly. It was somewhat overbearing.
I installed a software firewall on a dedicated server that I was administering. Problem was I forgot to allow an exception for remote desktop.
Formatting a floppy disk - but instead of "format a:" I typed "format c: " and didn't notice until I got past all the prompts ...
Once I were injected into a classic ASP to Asp.net conversion project. [total 1.5 months and we have no contact with the client during this he just give a Classic Asp site code] Login module is completed by the fellow developer. After all the hard work and even working from home when I have completed the project a week and a half before the schedule. I check the login pages of old application then certainly found out that my other developer have hardcoded a single role sign up. [Where there are roles with their respective pages.] This wasn't even checked by the Project manager who'd given the task sheet for completion. Left out pages were equal to the one I have developed. Eventually I check all those pages of classic asp again and due to Full OOP approach I was used in the project and working from home I was able to complete the Project a Day ago. And thanks to God project was very happily accepted by the client with more then expected performance and reliability.
Boy!
We're talking about 15-years ago, working on a 64 user DEC VAX running VMS. I'm debugging a program I wrote to scan through a bunch of files because it is hanging in an infinite loop. I made a small fix (moving the loop brackets), run and then... the whole system crashed, not just my login but the whole multi-user system, and left about 50+ people twiddling their thumbs.
2-hrs later, the system is back up, I log in, I get back to my program and run it... and the whole system crashes again.
Late afternoon, the system comes up and I am just logging in when my phone rings. It's the sys admin, shouting abuse and telling me not to run that $^&*ing program again!
Turns out I had made an infinite loop that continuously opened file handles. There were no quotas set on the DEC VAX I was using so when the system ran out of file handles it just crashed and burned taking down everyone else logged in as well (most of my division).
Both the sys-admin and I got a bollocking.
Pev
I managed to cause a subtle bug that lasted dozens of revisions, caused random errors in output, and took days to track down... by misspelling the name of a preprocessor #define. It was worse because I made the assumption that if the code was broken it would be immediately obvious, but for extremely subtle reasons the broken code would only cause problems in unbelievably rare circumstances, but just often enough to be a major issue.
This also makes the record of my shortest bugfix ever, a bugfix which added two characters.
You can't beat the old:
UPDATE mobile_phone_repairs SET booking_date = GETDATE();
and forgetting the WHERE clause.... 2 million+ records cabbaged in 2 minutes, with no backup, on the live server....
Doh...
Back in the DOS 6 days, I was playing on my Dad's work computer (he's an accountant) and I discovered the deltree C:*.* command. To complicate matters, he was using Stacker to gain some hard disk space. It took a long time to recover from that, and it was quite a while before I got to use the computer again.
two cock-ups from my younger days (mid 90s):
I meant to do this:
adminbox# rsh otherbox 'sync; reboot'
instead I did this:
adminbox# rsh otherbox sync; reboot
(huh? why did my xterm disappear?)
No lasting harm done, all I did was annoy my boss for the five minutes or so it took the thing to reboot.
But the next one was catastrophic.
The usenet news server had run out of disk space (too much pr0n). So I thought I'd remove some older articles to make room.
news# find . -mtime +60 |xargs rm
It worked well enough, and gave us plenty of space.
An hour or two later, I was still on that server. I was looking for a config file, I think, so I did something like
news# find . |grep conf
that didn't work, and I got distracted; when I returned to this task I still wanted to find that config file, so I did something like this:
news# cd /usr/local
news# !find
Unfortunately, I didn't type "!find" in the same window where I'd been looking for the config file earlier; I typed it in the window where I did that other find:
news# !find
find . -mtime +60 |xargs rm
and in a few seconds the news server binaries were wiped out, along with anything else in /usr/local.
Naturally, there were no backups, and I spent the rest of the day compiling and installing INN.
These mistakes cured me of the habit of remaining logged in as root.
Near the end of the dot com bubble, my company was doing research on a sector of the market. We had been given a database with company names in that sector, and we weren't sure if many of them were still viable companies. I wrote a quick and dirty app which looped through the database and tried the URLs to see if it got a valid response... the assumption being that a 404 would be a failed company. The app used the IE browser COM component and actually displayed the pages while it processed. I split the database into three sections and set it to run on three machines beginning at the close of business and running overnight. My cube was extremely proximate to the CEO and CFO.
Upon arrival the next day, I discovered that the database was not at all accurate. Apparently it was open to the public for update, and numerous spammers and porn companies had inserted records and URLs of their own. This, in itself, was not terrible. What was terrible is that many of the pages when loaded, spawned pop-up windows of extremely explicit details and while the program moved on to the next page, the pop-up windows were orphaned and visible for all to see.
I had some 'splaining to do.
I once worked on a project with a multi-architecture build, spread over several machines.
Occasionally, a build would fail on one machine, and leave files lying around. To fix this, I would log onto the offending machine and clear down the directory - conveniently named /tmp/build.
I found I could do this remotely in a nice simple commandline:
rsh -l user "cd /tmp/build;rm -rf *"
This seemed to work fine. But one day the build failed for a slightly different reason, and the /tmp/build directory wasn't there. Just to add to the fun, the user was root, and the default home directory was /.
Using a workstation's MAC address as the "register number" on a (DOS) networked point-of-sale system. Hello, register number 08AC00007AEC0991!
When using sockets , never call readline
on data that is a long line , and doesn't end with a newline character .
Working for a startup that has either "Go Public" or "Get Sold" or both as part of it's business plan. The second that starts going around the company, bail.
In my first year as a developer, I accidentally pointed the production app to the testing DB. Nine days later I cleared out the testing db. One day after that, I realized what I'd done, but it was too late. Nine days of registrations and financial transactions were gone.
I was given a GPS system that was used on ships and given the task of getting a program to interface with it and collect co-ordinates. We didn't have a manual for the device, but once I got it powered on, I found a big help button. I thought that might at least get us started with how to use it.
After pressing the button, it beeped a couple of times, and then the screen started flashing: Sending S.O.S. signal
Gosh damn - I unplugged the power cable, hoping the thing would turn off, but it must have had an emergency battery inside, because it carried on going, and there was no stopping it.
I waited, very anxiously, expecting a Sea King helicopter to appear outside the office at any moment, wondering how I was going to explain what had happened.
Fortunately - either because I was indoors and the signal didn't get through or because the receiver of the signal realized an S.O.S. originating 100 miles inland probably wasn't a real shipping incident - no sea king turned up. Phew.
Say you work for a company. Say your company is providing software for another company. Say you also provide them with a server to host said software. Now, say this company is far far away. Ok. Say when you telnet in to the remote company, you don't have a user account. Alright... now say you ask your manager "who should I log in as?"... if your manager says "just login as root, it's an empty server anyway" - Here be Dragons! Within 20 minutes we got a broadcast message saying "Ha ha ha! Got you!" followed by a sudden drop of our telnet sessions.
The best was when I was remotely connected to our build machine, and we would sometimes have issues where DNS would drop on our internal network so the easy solution was to bring up a command prompt and do:
ipconfig /release
ipconfig /flushdns
ipconfig /renew
As soon as I brought up the command prompt and typed ipconfig /release and hit enter, I was wondering why I lost my connection to the server...
Then it hit me, ahhh doy! Remote Desktop Connection! LOL
So I had to go down to the server room and directly connect to the server.
Working with about a thousand blogs and making an uninstallation script for one of them... Well, hrm.. The script took about 999 too many. My Boss noticed that he couldn't access one of the blogs and he asked me, I tried to come up with a good excuse but I couldn't...
He then asked me if I had fixed the backup system as I promised... but well, there where complications. -.-
I was young. My manager didn't like the numbering scheme of the backup tapes and told me to recreate the backup tapes with better numbers. He told me to use [some specific command syntax I've forgotten] and I used that verbatim, as instructed. It released the backup tapes for reuse instead of renumbering them. We caught some of them, but some were overwritten. For the next eight years I was afraid that this major metropolitan hospital would be audited and have no financial data. I think most of my major mistakes have come from trusting someone who said "do it this way" and not researching the method myself.
ok, even if you will not believe me when I'll say this, I DID NOT DO what follows.
years ago we were working on a very big web site, for a very big customer.
a colleague of mine, while working on html pages, for some very obscure reason decided to launch a search and replace command using his VERY powerful editor, directly on the production server's file system.
that was - needless to say - after a long session of updates on the pages contents.
well, the command was to REPLACE EVERY SINGLE SPACE ON EVERY SINGLE FILE WITH AN EMPTY STRING. every single little space. disappeared.
the site started to implode. the command caused a sort of zipping of everything, and it was not reversible.
after a few seconds, when he understood what was happening, he even tried to launch himself on the network cable, but it was too late!!!
so: don't work directly on production pages, backup often, and turn on brains!!!
I Ran a batch script I've found in some blog in our server to delete some files.
I just wanted it to delete some specific files (older than 7 days) recursivelly in a folder, but the script found some files with names like %temp%file and began to replace them with the windows variable values,
It ended up in the C:\ dir and began to delete everything.
Luckly I was looking at the screen and hit Ctrl + C ASAIC, but sadly the server was a bit fast and it was able to delete 2 databases in the meantime.
The worst part is that we just found out that it had time to delete the databases one month later, and the backups were keept for just 7 days.
Lessons learned:
- Never run a script you've found on the internet and you don't understand it's code.
- Batch scripts are evil
- Never delete ANY backups, EVER.
In first device programming job, me and my co-worker were trying to solve a problem involving the force that needed to be set as a parameter to allow our $25,000 device component to extract itself from an injection point. Unfortunately we made the assumption that increasing the value of the force parameter in the function we were calling on the device would do the trick. What we didn't realize (not having read the manual) was that the force was in a range from a negative value to a positive value, with the positive value being downward force. We needed upward. Our device got jammed deeper in the vessel it was injecting into... and of course then the next step in our process was a very forceful shaking of that vessel thus destroying our injecting device. First month on the job and I'd already cost them a $25k part...
Wasn't a programming WTF but I drove almost 2 hours to install an important software update to one of our store computers. When I got there I realized that I had forgotten the 4 or 5 floppy disks that were required for the update. I ended up using a remote connection and very slowly downloading each individual disk for the installation. I never told anyone there about it...I was too embarrassed.
This one won't trigger nuclear war, but its very easy to do at 3am.
I wanted to find all the system includes in a C/C++ file I'd been working on since 8am.
So naturally, "grep > precious_source_file.c
".
I sat there for while wondering why it was taking so long to parse the file when the penny dropped.
No backups since the previous night.
The moral of the story: Take a break before your IQ drops to 10.
My WTF moment was a deployment WTF.
It was the first ASP.NET 2.0 application on a server running 1.1. Someone else ran some tests on their machine and said that running asp.net 2.0 and 1.1 on the same machine was fine.
Their machine was XP running IIS 5. The production server was running IIS 6.
Application pools? What is that?
So I deployed the application to the production environment with the same app pool and it didn't run. I changed the framework version to 2.0 and it ran.
I goto another site on the server and it was dead. I get some error message about the .net framework version. I googled the problem and added the application pool and set it up properly.
Phone call from the boss who is on a business trip because she was giving a demo in front of people. It turns out I crashed the server during the demonstration. When she refreshed the page it was fixed.
I once wrote a VB6 DLL that read from a database and auto-generated an HTML page.
It gets worse.
The page was a bunch of values that the user needed to enter and save, so the DLL also auto-generated an embedded javascript function that iterated through the controls on the page and composed something with the values; this was sent back to the server and saved in the database.
It gets worse.
The thing that the javascript function composed and saved was itself another javascript function. This saved function was embedded in the page and called when the page was subsequently reloaded; the function iterated through the controls on the page and set the values of each control to what they had been when the user saved.
So: I had VB6 code generating javascript code which generated more javascript code.
It gets worse: I didn't know how to use any of the escape functions in VB6 or javascript at the time.
Worked on network test software at NASA; I wrote an app that could generate arbitrary packets including poisonously malformed ones to stress test network equipment. I had 2 nic's in my development computer: one connected to my private test network. The other connected to the Official Government Network which is super-secure and managed by a third-party contractor who is paranoid about everything that goes on on their network. Guess which nic I accidently had selected when I sent 10,000 malformed ping packets from a sham IP address of "0.0.0.0"? It took my boss a week to get access turned back on for me; I'm told the network admins had come to the building's utility room and physically removed my ethernet cable from the patch panel. But I was just a summer intern, so really, what could they do to me?
Not scrubbing my input on a bash script I wrote, which resulted in a rm -rf /* Unfortunately, I was running this script as root
I was cleaning up in a source repository (StarTeam) and went to delete a bunch of files marked Missing. For some reason I was thinking, oh they're missing from StarTeam because I never checked them in. Later did I find out I deleted 70ish files which wasn't a joy to recover.
My boss, working in Windows command prompt and knowing that RD FOLDER will not do anything if FOLDER is empty, did an RD /S C:\WINDOWS to get rid of empty folders inside the Windows folder.
Now he knows that RD /S FOLDER gets rid of every last little thing.
My old company had a really nifty script to create an exact empty replica of a production database, for sanity checking table upgrade scripts when out on a client-site.
However, I didn't realise that sometimes tables were created on 'physical' instead of 'logical' partitions. eg:
create table foo on myLogicalPartition; // ok
create table bar on "/path/to/real/production/table"; // not ok
Needless to say, I ran the script and wiped out 3 of the client's production tables. Luckily they had backups!!
I once wrote a script that would zip up a bunch of python sources into a single zip file (organized into separate directories). All these files came from a subversion repository, so to cut back the size of the redistributable, I copied all the relevant files to a temporary folder within my working copy and then deleted all the .svn directories within it.
I spent the rest of the day trying to figure out why I couldn't commit the changes I just made. As it turns out, the script was overzealous and deleted the .svn directories not only in the source files, but the directory the zip was built in and the .svns two levels above that.
I now realize that it's better to either move the files somewhere OUTSIDE the repository before building or to not copy the .svn files in the first place.
Next story: I was trying to delete a package from my site-packages directory. I had grown tired of typing in long file paths and got to where I would just drag and drop files to the command line window. Here's what I meant to do:
cd /usr/lib/python/site-packages
sudo rm -rf /usr/lib/python/site-packages/some_package
I'm still not sure what exactly I did, but somehow, this ended up happening:
sudo rm -rf /usr/lib/python/site-packages /usr/lib/python/site-packages/some_package
You don't realize how much python has done for the Linux world until a mistake like that.
Quite awhile back a colleague and I were working on data conversions from one OS to another. This is back in the days when there were scads of competing non-standard computers (Eagle, Apricot, DEC Rainbow, Heathstar, etc.) and PC-DOS was just becoming more than a minor entry. I was standing behind the colleague, dictating actions, and he was at the keyboard entering them in. I was leading him through the cleanup phase of the 30 minute process of disc creation, changing directories constantly, and while still in the root, told him to type in del *. After he hit Enter, he looked up at me, and we both groaned at the same time.
I sure am glad Peter Norton's Undelete worked. Even if it was. One. File. At. A. Time.
It was the first computer I owned, paid for with my own money. Windows 2000 suddenly decided it would not power the computer down anymore, but would give me the "It is now safe to shut your computer off." This annoyed me so I started googling a solution, and someone recommended flashing my bios. Not even knowing what bios were, I thought this was a brilliant idea. Unfortunately, my floppy drive died in the middle of the update on the computer I had owned for only 2 months. My next computer had recoverable bios.
First day on a new job and I was tasked with helping convert a CVS repo over to SVN. While I was working I lost track of which server I was connected to in which terminal windows and wiped out the entire CVS repo.
Thank goodness for backups. Hell of a first day.
Glad to say I'm still employed there :)
I was in charge of a studio for a live TV breakfast show. We ran the closing credits etc... 5 minutes later the presenter stopped checking his email on the PC at the studio desk, looked around, then nonchalantly wandered into the control room to tell me he was still on air. I looked at a the televisions around the office and he was indeed, correct. I forgot to switch the transmission feed back to the network feed.
oops.
As a vendor, I was once working inside the data center of a private ATM (automated teller machine) network. One of the customer's PIN had to be reset as part of our maintenance work. I knew the encrypted PIN block of 1234 and wrote something like the following in SQL Query Analyzer:
update atm_card set pin = 'BA3452318689A190'
where card_id = 5
and somehow I selected the first line and pressed F5!! I didn't realize my mistake till the call center started getting calls from customers that their PINs were not working. There were around 10 calls in 5 minutes. When somebody from the call center approached me, I realized the mistake and temporarily delayed breaking the catastrophic news by saying that the PINs will work when the maintenance work was over.
I saved the day by looking for any backups the data center had taken that day; restoring the database with a separate name and running another update query referencing the external DB!
Lesson learnt: Always, disconnect production servers and take database backup before making any changes
I worked for a large bank and added an error message to a piece of code that the application should never have been able to reach (theoretically)...
One Monday morning the unthinkable happened. The error message was proudly displayed on over 10,000 monitors across 1800 branches, and would return when you dismiss the message.
The message read: "If you can see this message the system is all F**KED UP and we might as well go home. Have a nice day."
Thank goodness this happened before source control systems were implemented at the bank.
One time is was helping out a new collegue with checking in his code to SVN; He had just build a new module he was working on for the last 2 months and now he wanted it in Subversion.
So I checked in his module, removed the original files and checked the directory out again; Then the aweful truth hit me; I just checked in a symlink and removed the original directory!
Lucky for me the network admin could recover the backup home directory of the user from that night but all the changes he did that day were gone.
Moral of the story: Use SVN from the beginning and doublecheck if you are deleting something :)
Conversation with a DB-Dev:
me: Hi, can you fix the procedure xxx and add several other parameters?
he: No problem ... done. Now I'll do it on the testenvironment.
Never try to log objects via reflection (especially if this objects can be Exceptions) and forget a break-statement!
This would qualify as my FIRST WTF in my programming life about 18yrs ago at this point. Just started a new job as a programmer working in a language called MUMPS. I'm learning the ins and outs of it. It stores data in global references, designated like ^A, ^B, etc. So I was using ^CTK which equates out to my initials, and but also happened to be used for a system 'caretaker' process which governed the whole database. KILL ^CTK wasn't appreciated by the users or my new boss.
Wrote a little utility for a friend who was running a mailbox system at his home.
Because his machine crashed regulary he asked me to write a little watchdog that after an hour of harddisk inactivity simply reboots the machine. That was way back in the DOS days, and I was an assembler coding fanatic. So I started to write a little TSR programs (does anyone here remembers those?).
I hooked myself into all DOS interrupts and just forwarded the data to the original interrupt handler. To check if it works so far I flashed the VGA border color register.
Started my program -
everything seemd to work well.
Typed DIR . /s to make some disk activity ... Screen border flashed for a moment, then silence. Dead silence. System hung.
I rebooted, but the system didn't came up anymore. After a long recovery session I was able to boot to dos again. It turned out that I forgot to save the registers in the interrupt handlers around my border color flash code. That did all kind of nasty things like turning read requests into writes and vice versa.
I messed up my harddrive so bad that I lost most of my content. Guess who hasn't made any backup of his non-toy project? Yep. That was me.
I lost 3 month of works that way.
Never again I'll hook into critical interrupts on a production machine.
I erroneously put my home phone number vs the company phone number in an licensing error message for a product which we released a "free" version of on CompuServe. I did not discover this error until I received a phone call 2am requesting to purchase the product. Doh!
Built a new machine at home several years ago. Plugged it all in, and nothing worked - looked like the Motherboard was dead.
Spent a couple of hours removing and replacing stuff, including the PSU and the power cable. Called a friend for advice. Swore a lot. Convinced myself that I'd broken my shiny new toy.
I eventually thought I'd replace the 4-way extension lead it was plugged into. That's when I noticed that I'd switched it off at the wall.
Creating a member variable that I supposed would come in handy someday (YAGNI), in a persisted object that is; using copy_paste to get the initializer list right, and not test it.
MyClass::MyClass( const T1& ac_NewMember )
: mc_NewMember( mc_NewMember )
{}
A year after, it was also me, writing a repair tool for customer databases... :(
Lesson learned:
- YAGNI!
- Don't use any member variable as an argument in the initializer list (order of initialization is quite awkward in C++, too)
I had a block oriented data file class once that was absolutely central to an important sensor processing application at a long-term client. It had an internal structure a lot like a simplified FAT filesystem: allocation table, subdirectories, files, etc.
I took pride in being somewhat performance oriented... maybe not to extremes, but at least I was making sure the design was such that any serious bottlenecks were avoided.
Back to this sensor processing application. Once the files started to get above about 50-100 meg, a simple data read was taking up to a few hundred milliseconds, and I was ignoring it for years as just an IO bottleneck.
It turns out the initial block offset lookup was not getting copied into the file read properly, resulting in the read function reading from the BEGINNING OF THE FILE every single read, until it got to the blocks it wanted and copied them into the waiting buffer. Every graph on the screen called this disk read function 4-8 times a second, so you can imagine the effect this had.
Due to file caching, most of the disk read was in memory and so it came back very quickly.
This bug existed for MANY years, and once I fixed it the entire application became about 10 times more responsive.
Oops.
(cross posted from a closed duplicate question)
During my first year at university I worked on a game project in a course. When the course ended we figured the game was good enough to be used in game competition for Swedish students. So we submitted it and made it to the finals and therefor went to Stockholm (the capital of Sweden) for a dinner and party. We didn't win but some people's eyes were caught by the game and we got the opportunity to upload the game to one of Sweden's largest game sites. The problem was that I screwed the realeased version up. I used a Swedish letter in a settings file and because of that the version that could be downloaded had no enemies in it! The worst part is that I used the Swedish letter because it formed a really bad joke compared to using the real letter. At least I learned a lesson never to release untested software. :-)
I was REALLY green, and I was working on a web application for network security alert analysis and response. Since I was new, I was tasked with a large amount of testing. One part of the testing was to analyze the captured data for the alert and send out a message to the offending party's ISP. Well, for one such intruder, I noted the offending IP address, cobbled together the warning message, looked up the whois record, and fired off the stern warning message. Oh, the kicker.....the IP address was somewhere in the range of 192.168.x.x and I sent the message to IANA. Someone responded. Humiliation followed.
We had two prototypes of the new hardware. I was working late trying to get them to boot. This was in the days before Flash chips: the EEPROMs had to be removed from the board and inserted into a programming device to erase and rewrite them, a process I had done several dozen times that night.
I thought I'd fixed the problem, and was sure it would boot. My fingers must have been tired, it seemed like it was harder to push the EEPROM into its socket than it had been, but whatever.
Powered it on... nothing. What could be wrong? I started poring over the changes I'd just made, until I smelled it. That horrible melting plastic smell.
I had put the EEPROM in backwards, shorting power to ground and ruining one of the only two prototypes. My colleagues did not allow me unsupervised access to the other one until the production boards came in.
Added the Debian testing repository to test out a single program then a while later typing "apt-get dist-upgrade" without removing the testing entry.
First time I saw a kernel SEGFAULT on boot up. It was pretty cool until I realize all the kernels did that.
That wasn't too bad. Linux can be fixed.
Then in my hurry to fix linux I starting reinstalling Debian only to realize I just destroyed the boot partition and I wouldn't be able to boot into Windows and work on finishing my paper until Debian was finished installing. :)
When getting rid of the test data in the same directory as the live DB, I typed: sudo rm * .bak
I didn't realize that I had typed the extra space until I saw the message: rm: cannt remove '.bak': No such file or directory
Fortunately the DB was on a raw volume & I could simply link to it again, but it really was a WTF monent.
My latest WTF was, how little time it actually took for that kind of stuff to appear here. As if http://thedailywtf.com was not enough. ;)
Building an "installer" that inadvertently disabled the update functionality ... permanently.
The application was for generally non-technical users (mortgage brokers) and they would never notice, it was also essentially impossible to tell who received that build of the installer. So we had in the vicinity of 500 users who'll never get another update unless they ask. DOH!
I wrote a script to generate a case-insensitive regular expression of a search once.
It basically generated idiocy like ([Ff][Oo][Oo][Bb][Aa][Rr]), but for entire sentences queried.
So there was the time I accidentally deleted the "bin" user on an early 1990s BSD system. Of course, I included the option to automatically remove the user's home directory.
If you haven't figured it out yet, here's what the passwd entry would have looked like: bin:*:3:3:Software:/bin:/dev/null
Another SQL goof. I was working at a mom-and-pop ISP. We kept our dialup user accounts in an SQL table. I need to change the password of a single user.
UPDATE users SET password = 'foo';
Forgot the WHERE clause, naturally.
I informed my boss that tech support would be busy for a while.
This wasn't me (really) but our team had an interesting bug:
Our product had a voice engine to notify users via phone (e.g. a library notifies you that a requested book is available). Unfortunately there was a problem where the software called an old woman several times in the late evening and would promptly hang-up. The poor lady ultimately called the police because she thought she was being stalked.
The problem was probably a combination of thread-safety and time change (daylight savings). I don't know what the fix was but hopefully it at least involved (a) do not repeat a call within 24 hours and (b) check the duration of the call and auto-email sys admin if it is too short.
A table named WorkOrders has triggers on update, insert events. I didn't know what these triggers do important events like sending an email. After 10000 insert query, our customer 's system admin gone crazy that exchange server was down.
I think my favourite was looking through some old C# code I'd written when I was learning and discovering this gem:
if (this != null)
{
// Some stuff
}
I must've been burned by a NullReferenceException at some point in the early days and really wanted to make sure that it didn't happen again.
Overkill OOP (OOOP?). Several years ago an external contract programmer was tasked to create a visual screen editor for us. He was a die-hard OOP fan from what i heard.
The end result? Down to the smalles bit, everything was a class. Yes, he actually had a class "CBit" in it! And since this was a windows application, it relied on messaging to get things done. The absolute horror was revealed when we finally removed him from the project, got the source code from him and took over the development internally because we weren't happy with the project's progress.
Because the framework sent out each message to ALL fracking objects and since each object checked wether it needed anything to do with this message, the data export of this tool was slow as hell (not to mention the numerous bugs we had because this thing was so hard and painful to code in). Remember, every one of the tens of thousands CBit in a typical project processed each message. The data export took about 90 minutes with a full project and required a computer with 1 GB RAM so it didn't trash the swap file too much. This was back when a "good" computer setup had 256 MB of RAM.
Over a year later, some of our coders hacked in some caching and filtering mechanism and lo and behold, the data export took only 90 seconds instead of 90 minutes.
Though similar to this post regarding issuing an UPDATE without a WHERE clause, I've issued a DELETE on a production web membership database without a WHERE clause ... and the backup was out-of-date!! It took me 8 hours to restore the data using manual queries from a staging database that luckily had just been updated from the production database.
id10t ...
Not exactly programming, but probably a good lesson nonetheless.
I once configured our MS Exchange Server as an open relay. (I didn't know what an "Open Relay" was at that point, so I just ticked the box).
Ooops.
Went home on the Friday, everything was normal.
Came in on the Monday morning, email was down - clogged up with thousands of spam emails.
Lesson learned: never, ever, tick boxes on production systems when you don't know what they do!
I'd have to say that forgetting to switch the 110/220V selector on the back of the disk drive enclosure (many years ago, obviously) before flipping the power switch and watching the white smoke that drives all electronics leak out the back of the power supply.
We had a system in the field that used a combination of encoders and photo eyes to track packages moving down a conveyor. There was a database entry listing the position of each photo eye in terms of encoder counts. The original values were calculated from the layout drawings. Once installed the actual values in the field could be measured and the entries could be corrected.
About once a week we would download a new version of the code. It would always fail miserably and they would revert back to the original version. The code would pass all of our in house tests and simulations. We looked for weeks trying to figure out what was wrong. Finally after over a month we realized that we had never copied the working values of database form the field. We were still using the initial values in our copy of the database. When we down loaded we overwrote everything including the database with the correct values!
I once worked on some test software for a computer manufacturing center. The computers had sequentially numbered, six digit, Base 36 serial numbers that currently started with 'H1' and we tracked systems by those numbers. I needed a dummy serial number to unit test the software, so I made up one that wouldn't be hit in the normal sequence but would be recognizable, so I used:
H0RSHT
Of course, one reference to it was left in an error check so after a few weeks of production, I got a call from the manufacturing floor because an error box had popped up that said:
Error: Can't find H0RSHT
When Windows 95 first came out, my parents got a new machine with it preinstalled. I was well-known in the family as the primary suspect for whenever the computer wasn't working (and also the primary repair person). I downloaded winzip to C:\ and accidentally unzipped them to the same directory. Obviously, this would not do and I was already at a command prompt, so I just figured I'd move them all manually to the directory I wanted. Here's the command I used:
C:\> move win*.* c:\winzip
Apparently, Microsoft decided to change the move command between DOS and Windows so that the command could be applied to directories, too! My whole brand new (as in 3-days brand new) Windows directory was moved to C:\Winzip. If I remember correctly (only 12 at the time, sorry for fuzzy memories), there was some sort of issue with simply moving the files back from whence they came via the command line. Naturally, ALL shortcuts to windows files were borked (so much for that new fancy drag-and-drop feature for moving directories since Windows Explorer couldn't be found and I had no idea how shortcuts, etc worked). After a $50 repair bill and a very heated lecture about safe experimentation/personal responsibility ("undocumented changes to commands aren't my fault!"), the system was back in working condition...as working as Win95 could be, anyway.
Back from the old dos days:
The app running on the factory floor.
Main menu, the default option is to pick a job to work on.
This brings up a list of jobs, the cursor is resting on the first job.
On the particular station in question processing a job consists of printing a piece of paper and marking the job as done by that station.
A supervisor had called up the station on an out of the way computer and a broom fell over onto the keyboard. Note the enter on the number pad.
Main menu: says to process a job. The list comes up, says to do the first job. It then recycles to the main menu.
It was a few hours before we figured out what was sucking all of the jobs out of that station as fast as they appeared.
I consider this as one of my 'divine intervention' moments (Ref: Samuel Jackson in Pulp Fiction)
I was setting up my machine for a live technical demonstration of a database migration. I was too early to finish my preparations. So I thought it was good time to get rid of all the files in a temporary folder. Here is what I did roughly:
C:\> del /s *.* c:\temp\demo
The console started showing the list of files the system is deleting. I was thinking "dah! I should have used /q or something similar to make it silent". After a few idle minutes, I paid little attention to my console, it was actually deleting all the files and the folders under C drive!! The last parameter c:\temp\demo was completely ignored by windows, and it started happily deleting the stuff under the current path which is C:\
I pressed all the control+break+c+printscreen and what not! But too late. I already lost many things under 'program files'. My MSSQL enterprise manager does not open up. Del command was deleting files and folders in alphabatical order. I was pretty glad that C:\Windows is alphabatically at the end.
After plenty of un-deletes, re-installations, system restore, and simply copying installed files from other machines (!), it is still unbelievable to me, that the demonstration (to really big folks) went through just fine. I clicked buttons, moved my mouse, etc only if it is absolutely required. I didn't even dare to open notepad.
I think everybody who works with Linux's bind mounts and chroots has been burned by something like this at least once:
$ mount --bind /home /chroot/home # sudo chroot /chroot # [...] # exit $ sudo rm -r /chroot ^C^C^C^C
My first job out of college, I was a server admin at a boutique financial firm. The company ran all sorts of simulations on expensive SPARCServers that were stacked up in a small machine room -- slightly bigger than a storage closet.
The really powerful AC was installed with the vent blowing straight down from the ceiling to the machines. If you were working on the machines for more than a minute, your neck and shoulders were chilled more than your favorite after-work beverage!
I came one weekend for a long set of server patches/upgrades, and spent a lot of time at the console terminal (single user mode) -- after rubbing my shoulders and neck one too many times, I turned off the AC switch on the thermostat.
Two hours later, I was done, verified everything was running well, and then left to enjoy the rest of the weekend.
That evening, I received a page that the system was down. I came in to work and the first thing I heard were several thermal warning alarms from the RAID boxes.
Fortunately, most of the systems survived. But one CPU module on the SPARCServer had to be replaced. I think that module cost the equivalent of two of my paychecks then...
I once divided by zero. Since then I cannot even complete a single se
the first version of error trapping with a custom error page in an asp.net project produced an error on the error page...which redirected to the error page...and so on.
this would have been fine except that each time it emailed the web master about the error before redirecting. 500 emails in under a minute the first time this occurred.
Did a stress test with a stress testing tool on a live message board (ASP) while logged in as administrator. The stress test deleted over 1000 threads and some users.
And as i requested the last backup the administrator told me that this database isn't in the backup plan.
At least that was some real stress test to me ;).
This is a bit of an embarrassing WTF. I was connected over RDP to a remote windows 2000 server, and I was taking note of the IP address as well as the network interface of the server in the control panel. So to open the properties window I went to right click, and clicked properties, or so I thought. Because of the slow connection my mouse cursor did not move as fast as it would have done in real life. So instead I clicked "Disable". Yes, the machine was now unreachable.
Bad day to be a pen-tester :/
The first time I installed win95 I noticed a systemdat and a system.da0 in the windows dir. Thought it would be nice to save this 1 meg on expensive hd space... ...so I started to reinstall win95 the very same day
I had one important and confidential document in Microsoft Word format (XP). I did thought that it was good idea to protect it with a password. I used a strong password with 16 characters of length...
Some days later, I needed to review the file, but with such surprise the damn password was not recognized!.
I did more testing, and seemingly, the fault was from the way in that Word XP stored passwords with more than 10 or 12 characters.
The moral of the story is: ALWAYS CHECK THE INTEGRITY OF YOUR FILES (backed up files, encrypted or whatever)
Soon after joining a cross-platform open source project, I thought it would be a good idea to do some clean up. It had mostly been developed by Unix dudes and I was working on the Windows port. When I built it with Visual Studio, there were thousands of warnings complaining about potential data loss and suggesting types be cast correctly.
Away I went, changing 100s of files correcting this 'problem'. It still worked afterwards for me so no problems! As so many files had been changed, I checked it in by batches of 10 files at a time.
End result: everyone else on the project was spammed by CVS commit e-mail messages, and the build was completely broken for every non-Windows OS!!
i developed a diagnostic system for machines that should be used by several hundred persons all over the world. the application also showed some pictures of the machines that could get changed by the technical staff of the machine producer. while testing we didn't had all images of coffee machines and used some "bikini"-pictures instead.
needless to say that quality management was not existant and the test-database became production db.
thank's to automatic softwareupdating we only had to wait 20 mins till the first customers called and asked why their "coffe machine" looks like a D-Cup brunette.
A few years back, I was modifying a Java application for a customer. I decided to be real professional, and whipped up a fancy installer as well, using InstallShield, I think.
I found, of course, that the installer could display a splash graphic while loading its resources. I thought that would be cool and, just temporarily put a picture in there which I had lying around in My Documents - displaying a very scantily clad female.
Needless to say, I forgot to replace the splash graphic before I shipped the installer off. Actually, the customer never complained, probably because then the person in charge was a middle-aged man who probably didn't mind at all. My luck.
The lesson should be obvious.
Years ago, my old man wanted me to reinstall windows for him. He had windows 2000 on drive C: (NTFS) and data such as family photos on drive D: (FAT).
I placed all his documents and personal files on drive D before booting the Windows installation CD. Booting from CD didn't work for some reason, so I created a boot disk and typed
"format C:/q" ENTER
y ENTER
Copied boot files and cd driver to C: so I was able to reboot from hard disk again and start the installation process from there.
So, I rebooted and to my surprise I saw the windows 2000 screen come up.
"Hmm, the format process went fine, what happened?" were my thoughts.
And then it hit me. It really felt as if the ground underneath the chair disappeared as I realized that my boot disk couldn't see an NTFS drive, and as such it had mapped drive D: to drive C: and I had happily formatted all my dad's work on family photos and months of work on descratching is vinyl albums. Together with his documents and such.
We now understand why people make backups.
Not so much a huge mistake, but it took me a minute to figure out why this command wouldn't work:
sudo aptitude install sudo
Not quite a programming problem, but was caused by one so I'll include it.
Having worked for over 2 weeks on a particular bug in one of our modules that had been hanging around in various forms for over 6 months, I was particulary happy to finally find the cause and resolve it.
Being very pleased with myself I did what any program does at times like this.
Put my hands behind my head, stretched out my legs in front of me, beamed with happiness ....
and kicked the circuit breaker on the wall sockets below my desk, killing my PC, programmer2's PC, programmer3's PC ... you get the picture.
I no longer stretch my legs out under my desk.
Thread is probably lost in antiquity, but my biggest WTF happened at my very first programming job, of course. I was in charge of the installation. This is back when Win95 was new, and I knew jack about installations. I was using Installshield, and having a blast installing, testing, wiping the machine, isntalling Win3.1, upgrading to Win95, installing, etc.
Anyway, I had that installation SOLID man! And we were going to a big testing thing in St. Louis, and I was goign to have to install this on 50 computers there. Right before we got in the car to start driving to St Louis, I decided to use Installshield's "Package on Demand" feature or whatever it was called. Recompiled the setup, copied it onto the 10 floppies that it needed, and we took off. Without testing.
I then went and installed it on all 50 machines when we got there, one painful floppy at a time -- start one machine with floppy 1, then when it was doing start another on floppy1, while putting floppy2 in the first machine...this took HOURS.
Still didn't test it.
After I get them ALL done, my boss goes to a machine and says, quietyly, but my guts wrenched, adrenaline was not shot into my veins so much as hosed into it..."Matt...why won't ICS run? I get an error about a missing library?"
Yeah -- all the needed files were on the disk, but there was nothing int he installation program that would actually PUT THEM ON THE MACHINES. They all had nice icons, but no working program.
Remember, this is in the days before good reliable iNet from hotels. Hell, none of us even had any laptops, nor a way to remote into our work machines. We had to go back to an old version of the program that had tons of bugs, but at least we knew where they were and how to work around them. I did't get to sleep that night as I was up REintalling the app on all of those machines. Again.
Moral of the story -- not matter how trivial you think a change is, GO BACK AND FARKING TEST IT.
I pressed the reset button on my computer for like 20 times in a row... Just for fun :-X then both of my HDDs died together with windows :)))
It's probably extremely telling about me that no wtf moments come to mind immediately.
No wait, that doesn't mean I don't make mistakes... I make many.
But I hate to admit it, even to myself, which is why it's so hard to even write this reply.
Probably the most recent one that comes to mind was at a demo this year. It was with a potential new customer, and we'd planned a demo where I would meet my boss with my laptop to show the newest latest and greatest version of my GUI. Somehow I remembered the meeting time wrong, and thought I had an extra hour. Also, I had left my phone at the office.
Luckily some intuition caused me to cut lunch short, and I came into the demo 10 minutes late, and it did go quite smoothly and well, although very stressful for my boss (and me!).
Yep, it's painful to write this. I did learn from it though -- Be calm, and be more organised.
MySQL master servers need the option enabled to do Binary Logging, as that is what the replication slave reads. They also have options to log changes on some tables. Well, if you want to remove all those filters, you actually need to remove all the replicate-do-db
entries. Leaving one blank will result in nothing being logged for replication. :-/
This was a database that could not be taken down long enough to take a binary copy of the data for re-creating the slaves. It took the company 3 weeks to get everything back up to date again and yes, it helped lose me my job some months later.
wasn't me (this time) - Google Maps reckons you can't walk across the Sydney Harbour Bridge. You can of course. It is fine with driving directions via the bridge but not for walkers.
See This map to see what I mean. The funniest part - Google Maps was started in Sydney.
while(0) instead of while(1) :s
it took me something like 30 minutes to notice the mistake.
An old Linux system once greeted me with the following error after login: "You don't exist. Go away!"
Head-scratching. Virus? April 1st? Keyboard or screen highjacking?
A grep -r /usr/src/linux
returned a line in login.c
which triggered a thought. A check. Ah, yes, I had just deleted /etc/passwd
.
Managed to destroy the IIS setup on a production Windows server by re-installing PHP5 over a previous verison.
Sequence of events was basically uninstall PHP4, re-install PHP5 (so far so good) and then find MySQL doesn't work, so rerun installer checking every box that looked relevant.
It ripped apart the IIS metabase.xml file, and it took me about a day to find and replace it.
Ironically I just needed to uncomment 2 lines in the PHP config file, but I found that out the next day. I'd been upgrading it so I could run some of Larry Ullman's scripts from PHP5 and MYSQL4.
Rather embarrassing conversation with my boss followed(!)
A previous boss of mine once set up a monitoring system based on mail. If something went wrong, it would send a mail.
Well, something went wrong and roughly 30'000 mails were sent.
That wasn't the problem. The first problem was that the Exchange admins came complaining that he was "abusing" their server. After they were gone, he tried to delete the mails and found out that Outlook couldn't. Apparent, no one had ever tested Outlook with a big inbox.
All he could do was select the mails 100 at a time and then delete them. After that Outlook would allow him to select another set of 100 mails.
In a former life where I still had to do tech support and given that I still don't understand digital phones I had two very angry customers with different problems about different bits of software neither of whom were english-first-language, and I managed to connect them on a party line by themselves. I can only imagine how that played out.
In college, came in from the Pub one night and started writing code. Wrote FANTASTIC code for an hour, got tired and went to bed.
I woke, remembering that I'd done some seriously cool stuff the night before, had insights I'd never dreamed of, and while slightly hungover opened up the code.
To my horror I discovered a mishmash of absolute junk. I ruined about a weeks work on a project, had to throw it out and start again (no backups, in the days before source control).
Lesson learned? Don't drink and code
I have more than one "worst WTF moment".
- I was going to send a newsletter to all of our members (10.000+) that we were to release a new version of our application. I quickly wrote a small set of codes to send the newsletter and run it. But unfortunatelly forgot to loop names, so everyone got the newsletter in the name of the first record in database, which was also me :(
- I was coding an ecommerce site and was testing the credit card charging system. I was very bored of typing the same card number over and over again, so I decided to hard-code my own card, and pass the credit card screens. Well, at the end of the month, I saw that I'd spent over $1.000
- 1 or 2 times data loss (learned the "backup lesson" very well)
- I decided to use a remote control service to manage one of our servers. After installing the application, the first thing I did was to restrict all IP addresses, but forgot to enable my own. I went home to continue to write my code. At home, I realized that I couldn't connect to server, so at around 1am I went to the office and enabled my home IP.
I had a lot of data on an external drive which needed backing up. So I got another external drive, and plugged them in ready to copy from one to the other. There was a funny smell, followed by a few whiffs of smoke, as I watched four years of photos burn up, not to mention several hundred dollars of equipment.
The two drives, both made by the same manufacturer, had identical power supplies. Or should I say, apparently identical power supplies, but with the 12V and 5V lines swapped. There was NO way to tell which power supply belonged to which drive!
I did manage to recover the data though - I ordered an identical hard drive and swapped the controller boards. It worked. Score: Seagate 1, Akasa 0.
Well, there was the time where I was SSH'ed into a remote server (3 hour drive, one way) changing some configuration files for my customer. Once done, I used 'init 1' to stop the services so I could reload them...
Luckily this was a backup server; we had a service call to that location scheduled the next day.
Another (earlier) time in my IT career, I was in the Army. Another tech was helping me run some Cat-5 cable to some new workstations we were installing in the Headquarters tent. Once done, I went over to the switch and plugged in all the loose cable ends.
What I didn't know at the time was one of those loose cable ends was actually the workstation end of a cable already plugged into the switch. The auto-sensing switch. And no, we didn't have Spanning Tree enabled at the time.
The Commanding General was very curious why his network was down during his daily briefing...
In the old desktop database Paradox you could link to tables hosted on a Database Server pretty much as you can from Access to any other db using ODBC ... the subtle difference was if you deleted the link to the table then the table got dropped as well ...
The feeling I got after realising that I'd dropped some enormous hospital patient data tables was one I'll never forget ...
My first job after college I worked for a company that would send alert messages to pagers (remember those?) for different events that happened in a hospital. One of my first tasks was to write a new alert format for the pagers when a patient changed beds in a hospital. I was given the test pager and went on my merry way.
To fully test out the new messages I set it up to fire anytime anything changed on a patient. This gave me plenty of sample data to play with and I soon had the messages working up to par.
You can see where this is going right? I didn't ever turn off the test alert and it was sent to a large hospital with around 400 beds, which all had patients, who were all having their data changed often.
I didn't find out about it until my manager walked over to my desk with a stack of papers that came with the pager bill that month. $600 later and a lot wiser, I now triple check where my code is going to end up.
a company i was working for once used a bug in IE to access the clients machines from within a website to "increase usability" and making a seamless web/desktop integration. a few months after deployment the bugfix was rolled out with windows update and the customer wasn't able to use the features anymore. they were not able to work around this. not to say it was the last thing this customer paid the company for.
exploiting bugs to implement features is probably not the best idea.
This application allows a bank to track charges that customers pay for their bills. At the end of the day, a bank worker "ends the day" and the charges are transferred to, say, electric company. For ending the day one must:
- Click "day endings" from menu
- Choose "end day".
- Click "Continue" on "This will end the day and blah blah" dialog.
- Click "Yes" on "Are you sure?" dialog.
The customer requested a more error prone interface in this request. We could not understand how can they end the day accidentally but they told us they could. So, we added these extra steps at the end of the process:
- Type your password again on "password required to end the day" dialog.
- Click "Yes" on an additional "Are you sure?" dialog.
- (OK, this step was included to express how irritated we were) Show a full screen dialog with a red background and with the message "Are you sure you want to end the day? This is an irreversible action. Are you really, really sure do you want to continue?".
After that last step, the customer could end the day. After the code went in the production, we called the bank to check if everything was fine.
-- Did you see the updates?
-- Yes, we were looking at that.
While talking on the phone we heard a panicking sound at the background:
-- Mike! What the hell did you just do?
I am a youngin' I guess. The worst point was helping a friend with an old computer upgrade his RAM. By buying ONE large, expensive SIMM.
Oh, the pleasure of DIMMS! I think it was unsupported to get a total memory size = 2*SIMM we bought, so the purchase was useless.
I worked for a startup doing some Sega Dreamcast development. The devkits were pretty expensive and we only had 3 or 4 that we shared around. They had a nasty habit of frying if you plugged a cable into the RCA video-out jack while the system was powered on. I found this out because a senior dev on the team did this one day. BZZZT! A few weeks later we got our replacement dev kit. Same senior dev fired it up... plugged in the video cable... BZZZT.
This isn't my WTF. It was told to me when I started my job.
The support techs have a special "back door" for when users forget their administrative password. One of the company's oldest customers called and said that they couldn't login as an admin. For some reason, the remote assistance service wasn't working, so the support tech gave them the "back door" to try for themselves. Luckily, the customer was able to login and reset their password and all was well.
Then, about two years later, that same customer calls back. This time, however, it's a different person. They say that a suspended employee account is still showing up in the access logs. Guess who that suspended employee was... that's right, the person who got the administrative back door.
So, basically, they were let go a few months before that and got a new job at a competing company. Using their administrative superpowers, the employee was able to log into his old account and steal customer information even though all outward appearances indicated that his account was suspended.
There wasn't anything they could do except delete the account completely (which they should have done 2 years ago). That employee is still out there, though... lurking.
Anyway, the moral is: if your application has a secret back door to a fully-privileged administrative account, don't give it to a customer just because your remote assistance service is offline.
Actually, a better moral would probably be: don't include a secret back door to a fully-privileged administrative account.
Was logged on to a an apple server thing (whatever those are called again) through remote desktop. The something with the network wasn't working correctly, so I thought I would give it a fresh start. So I turned the network interface off, waited a second or two and turned it on again.
Well, I was going to turn it on again, but of course... that didn't work so well... :p
Of course, this would have been less of a problem if the computer I was connected to had a screen and a keyboard and, you know... anything... but it didn't have anything, so had to fancy boot it through firewire on another another computer and stuff... not what I wanted to do... cause I was already tired...
In other words: Go to bed when tired. Don't experiment with crucial things :p
The company I work at recently deployed an internal deployment tool whose app.config contained encrypted SQL Server credentials.
The distro came with a copy of decrypter.exe. Guess what it does?
While on work experience, another student and I would regularly send (entirely inappropriate) messages to one another using "NET SEND".
One day I forgot the argument that specifies the target user, and sent an (entirely inappropriate) message to everyone on the domain (around 5000 people) :(
NET SEND does not ask for confirmation before sending messages - be very careful when using it!
Keeping it on the police theme.
I was demoing some software our team developed, for a major credit card company, to the management. It was my first project and first demo, so I was a little bit nervous.
The software would automate incoming calls, from customers, after they had keyed in their credit card details. Once the call reached the correct hunt group (correct team/department), the software would retrieve all their details, including account history, faxes and scanned letters and images. This was quite leading edge at the time (early nineties).
Okay, I was sitting at the computer, with all the executives, managers and consultants, explaining how you just simply dial a number and the software just kicks in. I made a fictitious call typing 9999 on the office phone connected to the system, which invoked a dummy customer. All was well, I demonstrated the software and it's functions. Everyone seemed happy.
Towards the end of the demo a call came from ground floor reception, saying the police were downstairs and rushed in to a 999 call, from me - WTF! I forgot that dialing the first nine would make an outside call, then the next three nines, called emergency services.
The demo came to premature halt and I had to go and convince the police it was all my fault!
The credit card company took our software.
Not mine, but I was mentoring a (Windows centric) guy on a IBM DB2 database on AS400, and I told him to clear up some space on the box as we were running out.
(My WTF was that) I should've kept an eye on what he was doing. He stumbled around, looking for something like the Recycle Bin.
Inevitably, he deleted the bin directory.
I still have a visual snapshot of the server room burnt into my memory of the very second I realised what had happened.
Since I haven't been programming long I don't have any real WTFs to my name, I did discover busy loops right back at the beginning of my university course when I was practicing on the side; didn't take me long to realise that was a bad idea.
Went to the customer's site to upgrade the production version of our software, which was running on Unix.
Logged in as root & just ran the upgrade script without looking through it first.
The script assumed it was logged in under the application account and the first thing it did was rm -F $approot/bin/*
Of course $approot was not defined since I was logged on as root and not as the application. And root can delete anything!
It took the customer's IT guys much of the rest of the day to figure out how to get the system back.
I now always look through install scripts before I run them and I never run them as root unless there is no other way.
I once noticed a bunch of files uploaded to the production server by a colleague had file permissions that wouldn't let others modify them.
I did a chmod -R 774 * (or something similar to that) within the main web site directory. Unfortunately, losing that 001 (world execute) bit was really bad for directories, because it meant they could no longer be traversed by the Apache process.
It sent me into panic mode for a little while. Luckily it was an easy fix. In hindsight I should have used something like g+w to only update the group bits. I also find practice of having Apache run as nobody a bit dubious, as it requires all the files in your site to have "world" permissions. I realise it is common practice though.
- Me: [send SQL script to DBA to run in production trading database, whilst the markets were open]
- Dba: Here are your results - 21 rows affected
- Me: errr, 21 rows? Not 1 row? Any chance of a rollback?
- Dba: Nope
- Me: shiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiitttttttttttttttttttt..................
I managed to repair the data before anyone noticed, but I was crapping myself.
Long ago, I was testing some phone number formatting code in a very basic search screen. Instead of using something like 123-456-7890, I used my own home phone number, and forgot to take it out of there as the default text for the label caption. If an account did not have a phone number when displayed on the search screen it would display mine instead. Oh the phone calls and messages I would get about delinquent accounts for utility bills in other states....
I was working as an intern in my college's IT department and was asked to vacuum off the dust from the intake of the servers. The server room was on a platform and all of the power supplies were under the floor. I removed a panel to find a plug and when I went to unplug the vacuum I instead unplugged an entire rack of servers. All of those little green lights weren't green anymore and I almost singlehandedly brought down the entire campus computer network. I went and told my sup and we laughed about it. To this day (I have some friends that still work there), they still call it a CronJob (because of my last name).
One of the things we do at my company is instant win web sites, you know, the kind where you enter your info and you automatically get a winner/loser message? Yeah. Well our instant win system was designed to be completely random, but allowed for the ability to manually make the next X entries winners (for test purposes). Unfortunately, due to bad design, this ability was available in production too.
Well, one day, I'm logged into my DEV sql box (I think) and testing something, so I design to make the next 5 entries winners. 3 minutes later I realize I was logged into production. The prizes were already claimed and the users notified. The prizes were worth approx. $5,000 each.
Similar to Graeme Perrow, My company was working on a system that linked the county court to the local Sheriff's office. I accidentally deleted all of the arrest warrant data in the court system which then cheerfully started to tell the Sheriff's systems that every warrant issued in the past five years or so was still active.
Fortunately, I was able to take the court side of the communication link off line until we restored the data.
I tried to fix a typo in four or five item names at the production server
UPDATE art set artName=REPLACE("bad string", "good string", ArtName); Affected rows: 16241
WTF!!! Oh S@#$^ I got the parameters backwards. Now all the items are called "bad string"
Time to get a backup
Here was my thinking:
"Okay, let's write that query to delete the user with the broken login!"
DELETE FROM users
"And now I'll just write the WHERE statement..."
Query Complete: 7891 rows effected.
I will never forget that number.
My boss called me panicked one day and told me to get into the office now. When I got there, I was told all of the customers data was deleted. After research, we found that a tech support person deleting one account did the following:
rm -rf /home /username/
Beware of where you put your spaces kids. The only backup was a hard drive from months ago since the tape backup was being worked on by my boss.
A friend of mine was pulling out disk #1 one from a 3 disk RAID 5 set. The Compaq Insight Manager software notified him to change this failed disk.
The problem was, disk #1 is the 2nd, not the first one, since numbering begins with 0...
I had an automatic update utility that was working fine. But sometimes a download would fail, and the update would fail. So I added a CRC check. If the download failed, it would fail the CRC check and download it again. Unfortunately there was an overflow issue in the CRC calculation, which meant that after about 3 months of normal operation, a file was uploaded that mis-calculated and the downloads could never work. So it sat there downloading again and again and again.
The $40/month server that hosted the downloads went over its allocation, and the ISP billed for an extra $4,000. They wouldn't budge on reducing it either.
Oops.
I don't remember who on our team was responsible for this--it might or might not have been me, but I am the one who found the cause of the problem after about four months in production,
First build a Stored Procedure (MSSQL 7.0):
create proc RemoveItem
(@cnty_id int,
--and so on
Now call that proc from VB6:
exec RemoveItem('15'...
This works fine, despite the type mismatch. HOWEVER:
exec RemoveItem('09',...
Doesn't work. Doesn't error, either....
I was testing a new linux server. I had a problem with permission and after some struggling I found the right directory, it had a bunch of subdirectories and just for testing I decided to put everything on 777. I don't know why (maybe I was tired), but I decided that adding a "/" was the best way to select the current directory. I issued this:
chmod 777 -R /
I immediatly realized what I had just done, but that damn command is fast! It had chmodded half of the system while I was founding the ctrl+c. Needless to say I had to start everything from scratch.
That day I learned a valuable lesson: "You can't fuck up with linux"
My work mate was trying to fix an issue on the production database.
UPDATE Campaigns SET Status = 'Error'
F5
%#@$&^!!!!
Needless to say he wore the dunce hat for the next month :-)
In an attempt to avoid screwing up a delete statement I did a select to verify my where clause, and then wrote the delete. Unfortunately I didn't comment out my select, deleting the whole table (even though there is a where clause)
Delete theTable
select * from theTable
Where foo = 7
Thus deleting all records in the table.
We had a guy that wrote an error reporting component in .NET, and inside it he hard coded his email address as a CC.
Well, he left, and they disabled his email account. Every time his component would process an error it would throw an exception because the email address was no longer valid.
This component was used in a lot of different places. Oops!
While I was working on my coop (paid internship in college) I was developing a small testing framework which queried a database and generated an HTML page with test results. One day my manager had planned to show some of our work with others at the company. I forget exactly what it was, but I apparently did something to screw up the current tests immediately before the meeting.
So, during the meeting when the time came to show everyone the test results page, instead of showing a bunch of 'green' test results (results were color coded) everything was red which meant that all the tests were failing, and very quickly (the test programs were failing to execute).
Needless to say I learned a hard lesson that you should never make any changes to a system before a demonstration without verifying that everything is still working correctly. I was quite embarrassed.
update user set userpic='' where email='[email protected]'
update user set userpic='' where email='[email protected]'
update user set userpic='' where email='[email protected]'
update user set userpic=''
me: oops
thankfully there was an easily parsed registration log file that allowed me to restore user avatars.
Once worked on web application with err... 'interesting' self-made error handling logic. From outside - nothing fancy, it just logged errors in file. It was tied together with logging.
I was assigned to fix some bugs. To start doing something - i had to deploy our app at first (it had not automatic build, just manual with a lot of configuring tasks).
Everything went smoothly, when suddenly i saw something new. Actually - it wasn't anything new, it looked like standard "Server application unavailable", thrown by IIS, but nothing helped to get rid of it.
Got completely desperate and tried to reinstall .NET (you know - you can't just uninstall MS production). After failure - i tried to reinstall Visual Studio. After next failure - i had to reinstall my whole workplace, cause .NET was somehow corrupted. All of this just because i was completely sure that application logic must be fine and there is something wrong with infrastructure.
Next day we did some pair programming (cause my PC still wasn't ready). After some debugging we finally found the reason - co-worker has changed error handling. Before - it was like big try{}catch{}, where catch didn't throw anything further. Every1 knows that empty catch`es are bad (more like stupid), so now - every time application wanted to log something, in case it couldn't create a log file, it threw an error, caught it and tried to log an error, couldn't create log file, threw an error, caught it and tried to log an error...
A colleague at my current company used to write unit tests (NUnit) that did their testings under "c:\". Although he nicely introduced a TestDirectory property which got initialized by the test suites Setup method, he never thought of using a dedicated temporary test directory.
Guess what happened after another colleague went through all test suites and added the TearDown method that recursively deleted the path pointed to by the TestDirectory property...
This killed one developer's machine as well as our build server (TeamCity). It took me a few hours to figure what happened and a day to setup a new build server.
Some time ago I was working on refactoring some project. And in one module I have found the following method (keep in mind that this method was called only once):
procedure ReadIniFile(var InstanceIdent:string; var Desktop_Icon,SystemTray_Icon,Mute,MidiMute,MidiLoop,MouseInteraction,DisplayRight, KeyboardInteraction, IncludeDLL,
W95Key, Foreground, ScreenSaver,FullScreen, CustomSize, MaintainRatio,
AllowResize, ExitOnEscape, AlwaysOnTop,WindowsStartup,NoTitle,NoBorder,Draggable,
SetupMenu, CloseMenu, MinimizeMenu,MaximizeMenu,AlwaysOnTopMenu,SoundMenu,MIDIMEnu,WindowsStartupMenu,StartInFullscreenMode :RCheckBox;
var Left,Top,Width,Height,AlignmentV,AlignmentH:integer;
var ScreenName:string;
var ExpirationDate:TDateTime; var ExpRuns,ExpDays:integer; var ExpMessage,ExpWarning,ExpMessage_mailto,ExpMessage_url:string;
var Password:boolean; var PasswordCode:integer;
var pwEditLabel,pwWrong:string;
var HideCursor,DontShowInTaskbar:boolean;
var MidiLoopTimes:integer;
var SplashScreenCommand:string;
var SplashScreenDelay:string;
var CreditsScreenCommand:string;
var CreditsScreenDelay:string;
var BeforeShowCommand:string;
var CurDirSetting:integer;
var RightMenuDeleteItems:TRightMenuDeleteItems;
var EnabledShortcuts:TEnabledShortcuts;
var PPVer:integer;
var StartMinimised:boolean;
var DisableAltF4,PassEscape:boolean;
var WaitMessageEnabled:integer;
var WaitMessageText:string;
var SwitchDisplayWhenFullScreen:integer;
var Force16BitsWhenFullScreen:boolean;
var RightClickMenuCloseText,
RightClickMenuSetupText,
RightClickMenuMinimizeText,
RightClickMenuRestoreText,
RightClickMenuMaximizeText,
RightClickMenuAlwaysOnTopText,
RightClickMenuSoundText,
RightClickMenuMIDIText,
RightClickMenuStartUpWindowsText:string;
var RememberPosition:boolean;
var JoystickMode:integer;
var AllowTickerTop:boolean;
var AllowTickerLeft:boolean;
var AllowTickerBottom:boolean;
var AllowTickerRight:boolean;
var TickerDockSize:integer;
var DragTextAreas:boolean;
var SpecialAlwaysOnTop:boolean;
var EnableDraggingWhenFullscreen:boolean;
var DisableAltTab,DisableTab:boolean;
var DisablePrintScreen:boolean;
var SetupBoxMouseInteractionFont:Tfont;
var SetupBoxDisplayRightFont:Tfont;
var SetupBoxMidiMuteFont:Tfont;
var SetupBoxSystemTray_IconFont:Tfont;
var SetupBoxWindowsStartupFont:Tfont;
var SetupBoxKeyboardInteractionFont:Tfont;
var SetupBoxMuteFont:Tfont;
var SetupBoxDesktop_IconFont:Tfont;
var SetupBoxAlwaysOnTopFont:Tfont;
var SetupBoxMainTextFont:Tfont;
var SetupBoxLineColor:Integer;
var SetupBoxBackgroundColor:Integer;
var Joy1MapUp, Joy2MapUp, Joy1MapDown, Joy2MapDown, Joy1MapLeft, Joy2MapLeft, Joy1MapRight, Joy2MapRight:char;
var Joy1MapUp2, Joy2MapUp2, Joy1MapDown2, Joy2MapDown2, Joy1MapLeft2, Joy2MapLeft2, Joy1MapRight2, Joy2MapRight2:char;
var Joy1MapButton1,Joy1MapButton2,Joy1MapButton3,Joy1MapButton4,Joy1MapButton5,Joy1MapButton6,Joy1MapButton7,Joy1MapButton8,Joy1MapButton9,Joy1MapButton10,
Joy2MapButton1,Joy2MapButton2,Joy2MapButton3,Joy2MapButton4,Joy2MapButton5,Joy2MapButton6,Joy2MapButton7,Joy2MapButton8,Joy2MapButton9,Joy2MapButton10:char;
var Joy1POV0,Joy1POV90,Joy1POV180,Joy1POV270,Joy2POV0,Joy2POV90,Joy2POV180,Joy2POV270:char;
var JoyDisabled0,JoyDisabled1,ExitOnFocusSwitch:boolean;
var Label2caption:string;
PlugIns:TStringList;
var CustomBorder,MinimizeButton,MaximizeButton,CloseButton,WCaption:boolean;
var btnMinAnchors,btnMaxAnchors,btnCloseAnchors,WCaptionAnchors:integer;
var btnMinX,btnMinY,btnMaxX,btnMaxY,btnCloseX,btnCloseY,WCaptionX,WCaptionY:integer;
var WCaptionFont:TFont;
var BasicWidth,BasicHeight,SizingWidth:integer;
var WIcon,WIconX,WIconY,btnCloseTransparentColor,btnMaxTransparentColor,btnMinTransparentColor:integer;
var btnCloseTransparent,btnMaxTransparent,btnMinTransparent:boolean;
var MonNo,AlignToPrimaryMon,GotoSlideN:integer;
var DisableSpacebar:boolean
);
Anybody can correctly count number of parameters? I have tried twice, there is about ~180 parameters. :)
At a previous company we managed an application for a local univeristy managing the dining halls. Part of the application allowed users to send in comments/suggestions. Anyway, our company's generic support email was on the list of recipients. Long story short a girl had written in complaining about how the dining hall doesn't cater to her during her "time of the month" and the food would aid in giving her bad cramps, mood swings, etc. Also she had some kind of STD and couldn't do something within the dining hall. She came off as a real crazy #####. A guy I worked with recently got divorced and I "forwarded" him this email telling him "Hey man! this girl is a real catch!" Little did I know I hit reply-all and the rest of the support team and the girl got my response. I kid you not she called the office looking for me (glad my signiture had my phone # in it!) and yelled at me for a little bit. At the end she asked to be forwarded to the guy I had sent the email to. Five minutes later he came to my office and said "dude..she just asked me if i was up for a challenge."
I think he actually went out on a date with her considering he was in his 30s and she was a college student. Guess it all worked out for the best.
I got a few of these moments, but this one was one of the first. I was getting increasingly annoyed at having to connect to the database server to run the Perl's $dbh->quote() function to properly sanitize the records about to be inserted. Additionally, I thought it was too slow.
So, I decided to write my own quote() function and roll it into production, after some testing. Well, there was a little tiny corner case (ok, it was huge), I forgot to escape the escape symbol - \
. In an unfortunate series of events, one of my UPDATE statements was ending in a \
and this escape symbol ended up escaping the closing single quote and perhaps some other random event + 3 saturn moons aligning, and my WHERE clause ended up being completely ignored.
All 50 million rows in the table were overwritten with one completely insane looking value (insane enough to end in \
in the first place) and provided me with hours of research and blaming all kinds of MySQL voodoo for it.
To add insult to injury, the backups were failing for about a month, and there wasn't a single full copy of the database anywhere.
Additionally, my own quote() function was actually slower, even though it didn't connect to a remote server. Perl is slower than CPP, after all.
Lessons learned: test more, think more, and make sure your backups work. We now have a slave running that is purposedly delayed by 12 hours, which makes it a very effective rolling backup.
In the early days of the commercial internet (around 1995), I spent about 4 hours doing tech support for an ISP and ran away screaming.
A couple of months later, I get a job as an IT intern at an advertising agency on the floor about the ISP. Said ad agency has started an ISP of its own, jumping on the booming Internet bandwagon. All is cool until I get assigned to go unplug our company's T1 line in the phone closet in the basement.
I go down and find two lines coming through the wall. I unplug the one I'm sure was to our office (of course, neither were marked).
I find out later that it was the ISP's line and they are now suing the ad agency, claiming it was done maliciously (the one and only time I met one of the partners at the agency :/). Amazingly, the dumb intern was not fired, but I did quit shortly thereafter to take up independent consulting.
My girlfriend was suffering from an embarrassing women's problem. I Googled it at work, and found an informative page which I thought she would like to read. I used Remote Desktop to get into my home PC, fired up a browser, put in the URL, and sent the web page to my home printer.
What I didn't know, was Remote Desktop attaches the local printer as the default device, and sent it to the office printer in the next room where the young ladies work!
I was running Windows and Linux with dual-boot. Windows because I actually needed it for work, Linux just for testing it. Under Linux, I had my Windows drive mounted under /mnt/C.
At some point, I got bored running Linux and thought that it would be exciting to see what happens if I delete all files from the Linux system. How long will the operating system keep running when all the files are gone? So I did a cd /, rm * -rfv (or something similar to it). After a few seconds, I saw C:/Windows flashing by.
In Windows ME, I somehow turned off the "Don't show hidden or system files" (default is on), and others, option in Windows Explorer. Later I browsed through C:\, noticing some new files I had never seen before.
I tried deleting those in C:\, later on I rebooted, but at the next reboot I figured out I should probably have left those semi-transparent icons alone.
I had just started working for a large Golf Magazine that had a department that ran an electronic tee sheet/POS system for regional golf destinations. One of your RS6000 servers in the Myrtle Beach area was not letting a customer dial in via modem so, following the troubleshooting procedures I logged in...grep'ed for the user's processes...su to root...and then kill -9 all of them...including process 1.
At the time, mid-July, there were 200+ courses in the area using our system that no longer had tee sheet access. There was confusion as to why a fairly new server crashed in the middle of the day for no reason...
My sysadmin at the University made himself famous one day I went to his office to request a owner change. I don't remember why, but some files in my /home/users/mylogin directory appeared to be owned by root, so I came to his desktop to request him to execute a "chmod" on that files.
Oh, wait, I forgot to mention, those files were "hidden files" (it's filenames starts with .). Oh, and one of this files it's a folder... So, the sysadmin, after a few moments of thinking, said to me: "Ok, this could be done very fast with just one command", and thinking "look at me, I'm a so clever kind of sysadmin" he quickly typed
chown -R mylogin .*
My eyes caught fire as soon as I looked that "clever command line" this guy has just typed, and even more fire as I saw the sysadmin's face color turning red and then fading to green blue, and finally white, as he started to think "Hey, why the whole filesystem started to scroll in the screen? Hey, is the /etc folder what I saw in the screen? I've made some kind of mistake in the command, or is just my imagination? What could happen if I hit Ctrl-C now? HOLY F*** I'VE JUST MADE THE WHOLE SERVER OWNED BY THIS USER!!!!!!
Ok, I'm guilty because I've been looking over the sysadmin's shoulder while he was typing that command, and instantly realized what will happen, but just kept quiet to stare the reaction of that poor guy as soon as he realized what a mess he just have made.
I used to do some work in Emacs under Unix, and some in Visual Studio under Windows.
For some reason, I had over the years picked up the meaningless habit of always going to the beginning of the line before I saved the file in Emacs.
One day, I was sitting there writing code in Visual Studio, and suddenly when I looked up, the screen was blank and the file had been written to disk.
Puzzled, I retraced my steps. I had just tried to save the file when this happened.
So, I had hit
Ctrl-A, which is "go to the beginning of the line" in Emacs, but "Select All" in VS.
Ctrl-X, which is the beginning of the save command in Emacs, but "Cut" in VS.
Ctrl-S, which ends the save command in Emacs, but "Save" in VS.
To add insult to injury, "Save" in Visual Studio used to flush the undo buffer, so hitting Ctrl-Z to undo didn't help.
Thankfully, once I realized the problem, I realized I could just do Ctrl-V to paste back the now missing code.
Reminds me about the time when one of the developers I worked with dropped the live database that had no backup in place!
Actually, I did something really stupid once: I was logged in on a live UNIX-based system as root and executed rm * -f (or whatever the syntax is) forgetting that I had just changed to / moments earlier! Luckily the system had been backed up.
Some years ago, myself and a partner set out to write a program to do high speed formating of floppy disks. After spending several days writing this on our PC Clone, we had it working perfectly. Spent a couple more days reviewing the code to make sure that it was perfect. Ok, time to beta test. Gave the program to a customer that had an IBM XT. The customer ran the program and it seemed to work just fine. Until he closed the program--got the famos "Abort, Retry, Ignore" message. It seems that we had failed to save some values before calling the BIOS to format a track on the disk. The net result was that on XT's, when we wrote the new directory structure, instead of going to the floppy, it went to the HD. Of course, a blank root directory was just as good as formating the HD.
I wound up spending the entire night rebuilding his computer.
Lesson learned: Always assume that anything that isn't documented (like registers being saved) will function in the manor least likely to allow you to sleep that night.
Years and years ago, I had written a database migration tool that was going to be used once and then get discarded. The tool was tested on the staging server and proved to be doing its job. It was ready to be used in production.
I sent the binary to the server admins and then realized that I had accidentally wiped the source code, without committing the latest version to source control. I was just left with the binary.
Then, I was told that I needed to change a hard-coded numeric parameter that the tool was going use (it could have been a port number, or some threshold value -- I don't remember.) I fired up the hex editor, guessed which occurrence would be that parameter and changed it and sent the updated binary to the server guys. The tool did its job and nobody learned about the source code screw up.
Oh, and the WTF is that we were using SourceSafe.
This one belongs to my boss, but I don't think he's on here and it's a good story so I'll share it. He used to have a habit of putting nasty words in as debug output. He thought he was pretty good about cleaning them up until one day one of our clients sent an email with a screen shot of a pop-up window saying, "F*(*& You!" after one of the values they entered didn't validate. Needless to say, he quit that habit very quickly.
I was working on an online move ticket sales system. We had kiosks with credit card readers in the pilot movie theater. To unlock the kiosk to perform administrative operations, one would have to swipe a special card registered in the system.
I had registered an old debit card that I wasn't using anymore and left it with the people at the ticket booth so that anyone from the development team could come into the theater to unlock the kiosk during emergencies. I advised to the booth denizens to stash the card somewhere easy to locate.
One night, in another emergency where the kiosk bugged out and hijacked a majority of the theater's seats, I rushed to the theater and retrieved the card from ticket booth. Swipe, swipe, swipe, rub, rub, swipe, swipe... Nothing. The card didn't register. It was dead. I had to call a teammate to register my credit card in the system and ended up unlocking the kiosk after a lot of delay.
The location that they had chosen for the unlock card was on top of a CRT monitor.
dvips huge_and_very_important_report.dvi -o huge_and_very_important_report.tex
And yes, it had happened just before I was supposed to send it...
Many years ago I was swapping my 286 with my aunt's 386. Armed with a copy of pkzip I backed up everything I needed to keep onto a large stack of floppies and we switched machines.
After copying the contents of all the floppies onto the new machine it was time to unzip them. Where did I put pkunzip again? Oh, there it is, pkunzip.zip. CRAP!
With the old 286 reformatted and not being able to find a friend that had pkunzip.exe on a floppy I was SOL. Being only 12 years old at the time and well before Al Gore invented the internet in our home, I had no means to replace it.
Not me, thank (Deity /) I used to work for a television company that covers a world-wide motor racing season (yeah, that one). The entire TV complex was powered by 4 generators.
One day someone walked past a generator and accidentally bumped the emergency stop button. I was in the media centre and saw all the screens go blank. I looked outside at the generators, all with their exhaust raincaps resting in the down position.
The ensuing debate was whether or not to disable the emergency stop button stopping all the generators or just the local one. Cover plate anyone?
I once had to write a piece of software that gathered data from a local accessdb and load it up to a central point once a week.
It had to run at several unattended locations where there was no technical support. Accordingly this was written as a very reilient application (Full windows NT service, implemented acording to MicroSofts best practice, three network connections to three central sites defined etc. etc. etc.).
The program would try all three network connections, wait an hour and try again for two days before it would finally give up. The whole thing worked really well in test and it waas actually very hard to get it to fail -- so I left the error message "Bugger Me I just give up" in place thinking no one would ever see this.
Everything worked just fine (I got over five years uptime on one of the windows NT processes) except for one site. The network administrator for this region decided the naming standards were not good enough and implemented an alterative scheme, then, after head office found out was forced to revert to the original naming scheme. Which meant for a period of several months the network was reconfigured nearly every week causing half the support team to receive sms's, emails and various alerts with the "B***r me ..." message in the text every Sunday morning.
To make matters worse these were not native English speakers, and, this is not the sort of thing that gets covered in a respectable English Language course. I was asked "What does 'B****r mean " by ernest coworkers hoping to improve thier English on many occasions, once in the midlle of a management presentation with about 50 people attending.
You can probably guess the moral of this story.
The time I was demoing some software and did a software-initiated wipe of a production CD/R jukebox used for providing online access to archived scanned deal documents for an Investment Bank where I was working.
Everything was in backup, but I had to give up a weekend to build, initialise and re-populate the thing again:
80 CDs loaded via a cartridge/case & post-slot servo-mechanism (so no hopper to feed them), manually using the machine's console to identify/fill a slot with each disk, index writing (again manually from the console), then software initialisation and data-write. I was sick of the bank's server rooms by the time I'd finished :-(
Late one evening, I logged a linux box and noticed "You have new mail." Checked it, and the account had 60,000+ messages, all STDERR from a cron job. Well I thought it would be funny to forward all of those messages to the inbox of the guy who wrote the cron job and was supposed to monitoring it.
So a little proc mail recipe later and I decided to go ahead and call it a day and go home.
When I came to work late the next morning, the mail admin guys were running around with their hair on fire.
What I failed to consider is the company used Lotus Notes for email. And Lotus doesn't like a flood of email. My little stunt brought down all of the Lotus servers.. Which beside email, was trying to replicate data for other important systems, some in Korea, some in Germany.
The system would crash every few thousand emails. The Admin's would clear the box, reset the server, and then three+ thousand messages later crash again. And the linux box doing the mailing was on the same LAN as the server, and the admin's couldn't figure where the mail was coming from, etc, etc.
Lessons Learned:
- Lotus sucks more that I already thought.
- Don't send messages in mass.
- Don't try to do something funny when you are getting ready to go home.
Some of my earliest, functional programs written in FutureBASIC on my old Performa were meant to annoy the user. One was pretty simple, and ran in the background detecting any disk insert events, promptly spitting the floppy right back out whenever it got a hit. The one that's most relative to this thread however, was a simple, one-line program:
SHUTDOWN
Yes, that's a valid FBASIC command. Now, plant that in the Startup Items folder. Restart the computer, wait two minutes for all of your extensions to load and the system to boot up, then your desktop appears and...BOOM! Computer goes bye-bye.
Suffice to say I quickly learned that one major weakness to the program was that it (thankfully) didn't override the ability to disable startup items.
This happened when I was giving a live demo to business users on an application. Due to unavailability of test data and client environment on demo machine I was connected to the application hosting server using a remote desktop connection. The server was located on a remote site far away which I happened to know later :(...
In order to simulate the fail-safe mechanism for the application in case if link is down between server there was a use case when I had to disable the network and then simulate that the application didn't crash... It is only when I press disabled when I realized that I was connected to a remote server and disabling the network on remote server will not just disconnect every one including me but no one will be able to reconnect either...
Later on I got to know that it was located on a remote site when some one had to go there and enable the network. Worst part is that it was running on virtual so it took a while to identify which one it was...
Lesson learnt: Investigate the physical working environment location of the machines, availability etc and always prepare a demo environment on a physically available machine..
Some years back (last century!) I was working for railway company that were busy putting together a bid for a multi-million pound contract. All the bid data was on one PC. Although I was a developer, the boss asked me to take a back up of the data.
This bid was worth a lot of money, had taken the bid team months to put together, and was considered very important.
I rang the bid team, asked them to log out of the system on this single workstation, mapped a drive to the workstation in one explorer window, mapped a drive to the backup location in another explorer window and started the copy.
After about 60s, the copy failed with a "file in use" error - they hadn't logged out.
I phoned them up and asked them, somewhat testily, to log out; they said sorry, and said they were logging out now.
The partial backup already taken was incomplete, so I highlighted the files and hit delete. And as I watched the files disappear I realised, in horror, that I was deleting the wrong set of files - I was trashing the bid team's workstation.
I froze, while in an evolved "fight or flight response", pumped my body full of adrenalin. The files were disappearing before my eyes. This was a remote network share, so no recycle bin.
And then, as I reached forward to click cancel, and see what utter devastation I'd caused, the delete juddered to a halt with a "file in use" error. The bid team still hadn't logged out.
And through the fog of panic it started occuring to me that the files I had just deleted, those 100s of files, were exactly the same set of file that I had a backup of, from my first attempt. So silently, without letting on to anyone (apart from the visible signs of sweat and panic), I copied the previously backed up files to the bid workstation, and then backed the whole lot up.
No-one ever complained that there were any problems. I got away with it.
The moral of the story:
1) Always plan for your own stupidity - in this case, don't create a read/write share if you only want to read.
2) PAY ATTENTION TO WHAT YOU ARE DOING!!!
Even typing out this story has raised my heart rate in memory of that day!
It happened few years back when I joined my first company. I had never used version control before. Anyway, I was working on a feature and spent 10 days developing it. I created a separate branch to check in my code during the development. I didn't check in for the last 5-6 days.
When the development was completed I was supposed to merge my changes with the Trunk. I thought it was going to be easy and I would simply SWITCH the current working branch to the Trunk and then check in the code.
You must have guessed by now what happened :)
Lessons learnt :
check in the code daily ( if required multiple times a day). That's what a version control is for.
Learn what a command does before using it.
Fortunately as I was new to svn, I was taking local backup of the code and only lost 1-2 days of work.
It not happened to me but to a friend co-worker. He was taking some screenshots from a web application we were developing and he took a break. He had to send the screenshots to our client (a government organization) for approval. He was talking and joking with a couple of fellow workers and then they googled for "vagina en lata" (NSFW - search for it yourself). Of course he forgot to clean the top right search box in Firefox before taking a few screenshots more and send them to the client.
He was in panic but as the days went on surprisingly no one appear to have noticed it.
I remember working on an Election project for Mauritius Government a couple of years ago. There were a number of constituencies where the election was going on and it was all online and real time that day. Our software was almost complete with a few fixes remaining for a particular constituency where the things were a little different from other.
I was just a junior programmer and was told that for the "Rodrigues" constituency (the particular one) result will come last(for some reason I was not aware) so I can continue fixing the software. BUT my luck ran out and the first contituency where the result was declared was "Rodrigues". Phew... I had to quickly create a patch application in 10 mins (for which the Mauritius People waited)... Sorry Guys !!!
Was working on a Quake mod, back in about 1997.
I wanted to make the Ogres properly set the angle on their grenades so they would land at the target. (In the original code, Ogres always fired at 14.3333 degrees - even if you were below them).
Was sitting with a friend at my PC for a few hours as we worked out how to do it, sorted out the cos/sin lookup tables, etc, etc.
Made the last change, ran it and it worked!
Quit Quake and we're sitting there, big grins all round. He's looking at me, and can see the screen and I'm looking at him, away from the screen. With my fingers on the keyboard, I renamed the file (from the original temporary name).
Only, I actually typed "delete ogre_new.c"
About a second or so later, my fingers informed my brain what they'd actually typed.
He kept looking at me and I kept looking at him and he said, "I don't think you wanted to do that."
On the live, production database, wrote something like...
Delete from user_detail
Where username = 'runrunraygun'
However I had the first line selected in management studio query editor and deleted all of our clients customers.
This was the day I learnt how to restore and roll forward from a transaction log.
A coworker accidentally mixed up the default gateway IP and his server IP for a subnet of database servers.
I've never seen so many different department managers coming to the IT Operations area at the same time. All asking "Hey any idea why X is broke"
I was writing some automated steps for our build server, we'd been having trouble with stray files staying checked out on the machine after previous failed runs so I added a simple "revert all checked out files" step at the beginning of the build process.
Then I tested my build step. And it worked, it reverted all my files.
At the point where I saw "Failed to remove /proc/something..." I noticed the execution of my script is taking an unusual amount of time: half a second later, I understood what's going on and manically pressed Ctrl-C until it died.
Apparently, the rm -rf $somecachedir
missfired when $somecachedir
wasn't set to anything and basically executed rm -rf /
instead.
The bad news? This was shortly before going into production.
The good news? The web applications and databases we needed were all under /var
and since rm -rf
worked alphabetically, it didn't get there yet. There followed a long night of reinstalling the server, but it was only a single night. ;)
Takeaway? 1.) linux will not stop you from shooting yourself in the foot 2.) always check your variables before a volatile operation is executed :)
The company makes system for monitoring patients in residential care facilities. Patients can push a button to call a nurse for assistance. Any number of automated devices can also send in a call. Calls are displayed on central or area monitors, and the nurse responsible for that area gets a message via a paging system. If there is no response to a call within a set period of time (“call canceled”), the page is repeated, and is escalated to a supervisor, who is also paged. If still no response after a further time period, a higher level manager is paged as well as the first and second level responders. All pages are repeated until there is finally a response. This is a proven system which guarantees that no one will be left without assistance for too long, and has worked well in practice for years.
We enhanced the system to not just send pages, but to be able to use an email address as well, and I tested this sending to my own email address at our company. Worked fine.
Then, to stress test another part of the system, I ran a script which simulated sending calls repeatedly from thousands of patients without also automatically canceling the original calls, so that they continued to escalate over the weekend...
...Without thinking to disable the 'send email' function first.
Over the weekend I sent myself an email from home to work as a reminder of something, and got a “mailbox is full” response, which puzzled me, since I get so little email at work.
When I came in Monday morning no one else's email was working either. Of course our ISP had shut us down as a spammer part way through the weekend when the geometrically increasing number of calls hit some magic number.
->Complete a test and start from a clearly defined state before testing something else....