Often we find changing a few lines of code makes n-fold performance increases in applications.

One such experience for me was fixing an issue in the configuration library. The library was based on XML configuration file and XPath was used to access the configuration. During profiling, I found an enormous number of XPath objects and calls to XPath which came from the configuration library.

The application constantly looked up the configuration to perform tasks including logging - which referenced configuration multiple times even during a single client request.

By introducing a cache of the configuration values, applications using the library saw a 100% performance increase.

What is your best / funniest / annoying performance tuning experience?

(Also, can you please share what tools you use to find such issues?)

+2  A: 

I needed to store an array of integers in the order of millions. I didn't care about duplicates or order, and the majority of operations were contains checks.

One day I came up with the solution of using an array of booleans to store this list, where the index of the array was the value, with true meaning the value was in the list, and false if the value was not in the list. This worked fine for a while until we started having memory issues.

.NET bools are stored as a whole byte so we used an array of integers with bit operations to store the binary flags.

In the end we went from storing a list of numbers in a list of numbers, to a list of bools and then back to a list of numbers. But my contains checks are really fast now.

Cameron MacFarland
sounds like a perfect job for hashing
Consider using the built-in BitArray instead of rolling your own using an array of ints.
Matt Howells
@draemon: The standard .NET hashtables still need to iterate over a loop when they do a contains check. Since array lookups take O(1) time my array solution was faster.@Matt: I pretty much copied the code from BitArray, but couldn't use that class because the array had to be dynamically resizable.
Cameron MacFarland
@draemon: Doh! Nevermind, the hashtables in the .NET framework are O(1) aswell. It's late and I should stop reading SO. I can't remember why I didn't use hashtables...
Cameron MacFarland
Array solution would still be faster - it doesn't need to get hashcode then compare hashcode
Samuel Kim
If you need to store an array of bools, use a BitArray instead:
@Princess: Isn't that what Matt Howells said?
Cameron MacFarland
+1  A: 

We had three machines called inet1, inet2 and inet3 that are used to do some serious logfile processing.

Two more were bought and were eventually named inet5 and inet 7.

All sorts of problems were caused when inet4 was originally pushed out! For obvious reasons when you think about it.

Performance tuning by obvious mistake! Obvious with hindsight of course! (-:

Hint: Enter ifconfig -a!

Rob Wells
+16  A: 

We had a server that was occasionally quick, but really slow the rest of the time.

Eventually I figured out that the screen-saver was chewing the processor. Schoolboy error.

good reason to run headless/no gui servers!
Matthew Watson
The ideal screensaver has enough activity on it to let you know that the machine is still running, but not enough to chew processor time. On Windoes NT, we used to use "starfield" with a low speed and low number of stars.These days they're all headless or VMs anyway.
I always disable the screen saver, or set it to blackout.
Brad Gilbert
hmm surely the screen saver should run at low priority so it shouldn't affect things?
We had this problem once years ago running the pipes screen saver on Windows NT. I was told (won't bother to verify) that the screen saver itself was running at a low priority, but the 3D graphics engine that the screensaver used ran at a high priority.
Jeffrey L Whitledge

When we notices a drastic slowdown in the execution of some MDX-queries we profiled the Java-library using yourKit. We found out that there was an log-statement in one of the compare-routines that would write the result of the comparison of two values to a log-file. Naturally, the result of writing away all the comparison information of sorting 20.000 items results in a (very) large log-file and a drastic performance hit.

Unfortunately, the only solution we found was to turn the logging off the logging completely for this library.

+17  A: 

Back in 1987, I was working on a Geographic Information System. I was told to do what I could to speed up a process called "parcelization" that was taking 2-3 hours to process on a test database. It was comparing every line segment against every other line segment on a given layer to see if they intersected. I changed it so that it first compared the extents of both lines, and skipped if they didn't overlap, and then compared the extents of each segment and skipped if they didn't overlap. The test database processing went from 2-3 hours to about 15 minutes. I was a big hero with my peers and the client. And then a day later my colleague took another look at it and figured out how to cache the segment extents instead of hitting the database each time, and took it down from 15 minutes to about 2 minutes, and suddenly he was the hero.

Paul Tomblin
He couldn't have done that without your improvement.
Brad Gilbert
It is interesting that you both achieved roughly an 8:1 improvement.
Brad Gilbert
+3  A: 

My favorite experience was a tight loop of actionscript code where adding an extra statement (I think it was a meaningless variable assignment) actually made the loop run faster.

Joeri Sebrechts
+3  A: 

A company I used to work for had its own XML parser because of limitations in the 3rd-party parsers available at the time. The whole document would be read from a socket into (virtual) memory as a single string and a tree of elements would reference ranges of offsets within the string. This would have worked reasonably well except that the "get node text" function contained a bounds check:

if (node->text_end > strlen(doc->text))
    throw InvalidStringRange(node->text_start, node->text_end, strlen(doc->text));

As you probably know, strlen() scans the whole string every time it is called, in this case once per text node accessed.

This had no noticeable effect at first because all documents were in the order of a couple of kilobytes.

Then I started feeding it multi-megabyte XML documents. I wondered why it was taking 5 minutes to read each one and started to examine the code.

I'm not sure who is the idiot - the original author for that use of strlen(), or me for creating those huge documents.

+1 for the end ;)
+16  A: 

I once got fooled by the big-O notation. I had a table of values (big numbers, at least 32 Bit, but the code should work even if they get 1024 Bit long or longer). The code needs to perform look ups in the table to find out if an item is in the table and if it is, every item has an object associated with it that needs to be changed on each positive match. I should also mention, that this had to be done in pure C, as it was supposed to run on lowest level possible and with highest speed possible (there are plenty of lookups a second, so the code is very performance critical).

Okay. If you read a task like this, what do you think of? Well, I immediately thought "Hey, that's a task for a hashtable! A hashtable lookup is only O(1), nothing can beat that!". So I implemented it that way. I thought it's the best way possible. The problem is, I cannot use the number itself as index into the hashtable. Instead I had to treat it as a key and somehow hash it. I chose a simple, fast, yet good hash function (after all I need some decent distribution of values; if they all map to the same index, I have plenty of collisions slowing down the whole lookup). It was working quite well and we used this code for about 3 years.

After 3 years I started to doubt if this is really the ideal solution. Just out of fun I replaced the hashtable with a sorted table (where the keys where just sorted from small to big numbers) and performed the lookup using binary search. Result: It was much faster. A binary search has O(log n), but that does not say it's slower than a hashtable. It means if I double the number of entries in the table, the lookup will become slower by the factor log n, while the hashtable lookup will stay equally fast (if I make the hastable big enough to not cause much more collisions of course). And for the small number of items we had, binary search actually outperformed the hashtable, as it usually had the result much faster than the hash function could calculate the hash of the number key.

I benchmarked both again, to get the average lookup times and calculated "How any entries must the table have so that the average hashtable lookup will outperform the binary search? Or IOW, when gets binary search slower on average". I came up with about 730 items. Thus there is a time when hashtable gets much faster than binary search (which is of course expected). However, there is one constrain to consider: Caused by system limitations my program can't influence, the table cannot ever grow bigger than 256 entries. DOH! So hashtable will never outperform binary search in practice. From this day on we use binary search and performance almost doubled.

Hm. Did you try just a linear search too, perhaps with an unsorted table? Due to cache effects, and cache prefetching, it might be even better!
Thomas Padron-McCarthy
I could benchmark linear searching. I tweaked the code like this anyway: Table size 0, no search is performed (result false), size 1, I just need one compare to this one value and if 2, I do in fact use a linear search (50% chance to hit on first try). Only for 3 and more values I start a bin search
The lesson: O(1) is slower than O(log n) for very large numbers of 1. :D
+14  A: 

Back about 15 years ago I joined a large software company and inherited some code from another project. We refactored it a little and added a nice clean API and breathed fresh life into what was looking like a dead code base. Towards the end of our project we decided to look deep in the bowels of the most complex elements of our inherited code to see what we could do about resource utilisation. We found a function called actRealBusy(), which basically slept for half a second each time it was called.

We could only presume that the original developers has some weird timing isue that they never fixed and introduced this function as a way round it because it appear in an event handling loop - and was therefore called a lot.

Of course we did the only thing possible which was to add an argument to it to speed it up actRealBusy(bool notThatBusy).

I've heard tales of people deliberately putting such lines into the source so that when the request to improve performance comes in, they have an easy win...
Bill Michell
+3  A: 

Many years ago, I was doing data entry on a system written in MS Access 2 which had been written by my immediate boss. It was set up with separate "active" and "history" tables and I had to periodically run a scan to check for duplicates before doing a transfer of completed items to "history". For performance reasons, it only checked items which were in the "active" table, but marked complete. Even though this was generally only a dozen or fewer items, it still took 20-30 minutes to run.

Then came the day that my boss decided not to renew his contract. I had been tinkering a bit with Access's VBA in my down time, so ongoing maintenance/development of the application semi-officially fell upon my shoulders and that duplicate check was one of the first things I looked at.

It was, to put it mildly, a mess. The check was done by opening an unsorted recordset of the "active" table, walking through it to find items flagged complete, and then, each time it found one, opening a new (also unsorted) recordset of the "history" table and walking through that in search of matches. I'm sure there are less-optimized ways of doing this, but I think you'd have to be actively trying to find one.

I rewrote it to open a pair of sorted recordsets and compare the first item in each, then move to the next record in whichever had the lower ID and repeat until it hit the end. Since this only made one pass through each table, it sped things up from half an hour to check a dozen "active" records against the "history" table to being able to check all "active" records in under 15 seconds.

(If I were to do it today, I'd use an SQL join and push the time required well down into the single-digit-seconds range, so I don't claim that my solution was the best one, but it sure was a hell of an improvement.)

Dave Sherohman
+22  A: 

A few months ago I was working on a point-of-sale system that runs on small laptop computers. There were a variety of laptops that were used, and they all were running Windows XP, but many of them were very old. Some had Windows-98 stickers on them. For testing my application I had one of each kind of laptop running in my cubicle. The oldest laptop was slow. Very slow. Very, very, slow. So I had to optimize the heck out of our program to get it to run in an acceptable manner on that computer.

  • I got rid of the database engine, and replaced it with text file storage.
  • In one place we were getting the top 10 items from a list. The list was sorted before getting the top items. I got rid of the standard sort and created a custom heap sort that only worked with N items.
  • I put code into multiple modules that could be dynamically loaded if/when needed, instead of loading them on startup.

Eventually, I was able to get the program working acceptably even on the very old, very slow computer.

Then came the day for user acceptance testing. We grabbed a bunch of laptops out of the fleet and ran the program on them. It turned out that the laptop I had been using for testing was defective in some way and the program screamed like the Concord on the non-defective computers that were actually going to be used.

Oh, well.

Jeffrey L Whitledge
Now that is funny.
Brad Gilbert
That is pretty funny and cool at the same time. Knowing how to optimize for slow machines is always a good thing!
The Wicked Flea
+9  A: 

Things that work fine in the simple or average case can really kill performance in a large case. Optimise for large inputs since small inputs will most likely run quickly anyway, just because they're small.

I saw this in a Delphi program for mine that parsed its input and turned it into a list of tokens, then processed it one token at a time, first token to last token. When the token was processed, it was removed from the front of the list and discarded.

The time to remove items from the list came to take up 90% of the run time for very large inputs, since removing the first item in the list resulted in all items after it being moved up by one (i.e. a large memcpy). This is just the way that Delphi's TList class works - internally, it keeps and manages a block of memory. The overhead of removing the first item and moving the rest up by one wasn't noticeable on small inputs, but it dominated runtime for very large ones.

I replaced this remove-first-item-from-list logic with a "current item index" that was incremented when the token was processed; and after processing was complete I freed the list all in one go.

The result was as predicted - the program ran ten times as fast for large input. Run time for small and average inputs was unchanged.

+1  A: 

The company i was working at that time was using an unique mechanism to execute SQL 'in' clauses - we had a temporary (oracle) table (IN_VALUES) dedicated for this purpose. Though this was supposed to be a stop-gap arrangement, it worked so well that it soon became the standard (ignoring some DBAs). The idea was very good - a temporary table with five columns, you insert into this table and do an IN on a select from this table. And some real intelligent ones amongst us did a JOIN rather than an IN with it.

One day we had to develop a website that involved filtering of articles - we already had a list of articles and we had to filter that list using article characteristics. Time for IN_VALUES I thought. I always thought that inserts took negligible amount of time. The site behaved very well until system and user acceptance testing, but the first day on pre-production, the server crashed.

Reason? The real users (as against we programmers and the "Accepters") who used it were doing some exotic filtering - on a list of 10000 or so articles. And the article ids were passing through multiple in clauses that 1. Really slowed down the search and 2. Increased the memory usage.

I replaced the filtering with a complete search on the characteristic very time and then used a Set (java.util.HashSet) to do the filtering. In one of my colleague's words, we did the intersection in java.

How did we find the memory usage in my local machine (WinXP)? Task Manager!

(and if some f you have better ideas of passing multiple values(List) from a java code to a oracle stored procedure, show me some light!)

You can use table variables for this, which, depending on the driver (I don't know Java), is easy or hard but always possible.
+2  A: 

Maintaining an ageing in-house source code control system written in Perl, we started to have performance issues as more developers used the system and the code base got larger. Whilst we investigated new hardware we had a look at the low-level RCS library that versioned individual files. This turned out to consist of a load of Perl functions along the lines of

sub branch { system("rcs branch...") }

The best part was explaining this to my boss: "We've found the performance issue - the RCS library actually does a fork to the command-line RCS utility!", "What, every user session?", "No - every second...".

David Hicks
+18  A: 

I was working at Enron UK on a power trading application that had a 2-minute start-up time. This was really annoying the traders using the application, to the point where they were threatening dire retribution if the problem wasn’t fixed. So I decided to explore by using a third-party profiler to look in detail at the problem.

After constructing call graphs and mapping the most expensive procedures,I found a single statement that was occupying no less than 50% of the start-up time! The two grid controls that formed the core of the application’s GUI were referenced by code that marked every other grid column in bold. There was one statement inside a loop that changed the font to bold, and this statement was the culprit. Although the line of code only took milliseconds to run, it was executed over 50,000 times. The original developer had used small volumes of data and hadn’t bothered to check whether the routine was being called redundantly. Over time, as the volume of data grew, the start-up time became slower and slower.

After changing the code so that the grid columns were set to bold only once, the application’s start-up time dropped by nearly a minute and the day was saved. The moral here is that it’s very easy to spend a lot of time tuning the wrong part of your program. It’s better to get significant portions of your application to work correctly and then use a good profiler to look at where the real speed bumps are hiding. Finally, when your whole application is up and running correctly, use the profiler again to discover any remaining performance issues caused by your system integration.

+2  A: 

Rookie SQL Server 2005 / SSIS mistake: Needed to load an archive file of 645,000+ patient records into a staging table, then compare them against the active patient file - looking for visits not in the active file. Very simple... LEFT OUTER JOIN from staging to active, INSERT where ACTIVE.PATIENT_ID IS NULL. Except... the comparison was taking days! After optimizing everything else, adding indexes, tearing out remaining hair - noticed that the staging table (created by the SSIS Import Data wizard) had made all of the fields nvarchar(50). Which I was then JOIN'ing to a series of varchar() and char() fields. Which was forcing SQL Server 2K5 to load a translation table for each field, strip down the nvarchar representation of each field to ASCII (essentially) and then do the join. Fixing the staging table to match the data-types of the active table resolved the problem... six seconds to do the base compare, down from 20-30 hours.

+16  A: 

I was sorting a vector< vector< int> > in g++, using std::sort, and getting surprisingly slow speeds. After a bit of a dig I discovered that while the C++ standard defines an optimised swap function for many types, including vector, g++ wasn't using them in sort and most of the other standard library functions.

A quick 15 line addition to g++'s standard library implementation and suddenly my code was going about 20 times faster, and hopefully lots of other people's code got a speed boost when the next version of g++ came out.

Chris Jefferson
+1 to help others out also.
Ólafur Waage
+1  A: 

I ran into a doozie. Our client wanted us to add a mapping solution to their web application. After they selected a vendor for us (!) we went ahead and implemented it, and after eating some of the costs, and some team overtime to make the thing go, we finally pushed the thing live.


Once it went live, everything looked fine. But there were reports from the call center that the application was slow, and the hosting provider had said that there was a lot of time spent on garbage collection. It was deccided to rip out the mapping portion out of the application, as when the servers are under serious load in a couple of days, they won't survive.

So, we had to go back, and tear it all out again.

Only to find out that, all in all, it really didn't help out with the garbage collection. Then we find out that the reports from the call center that the application was slow, was just from someone AT the call center, not that people are calling in to report it is slow.

Later, we are tasked to find out what is causing our GC issues. We finally find out that the way the application was tuned--by the vendor of the server software no less--was completely wrong for the kind of work it does. We changed the GC parameters of the JVM and lo and behold, our GC problems became much better.

Jonathan Arkell
+5  A: 

My favourite was a list view that was slow to update when a huge number of items were added.

When I was profiling it I noticed that it was so much quicker when the view was hidden. This made me realise that disabling the window, running the update and enabling it again led to a massive speedup!

Thomas Bratt
I haven't profiled it, but at least in Java Swing I often do the massive update on the TableModel in a subthread, and then fire the table updated event in the EventQueue.
Paul Tomblin
I've had similar, but disabling the window sounds like a hack.Most list view classes will have some kind of BeginUpdate(), EndUpdate() methods to suspend visual refreshes after each item is added, for just this scenario.
+34  A: 

A long time ago, in a land far far away, I got a contract to maintain the accounting system for a medium sized courier company. Their immediate concern was that the overnight batch processing was running into the next day, when the (only) accounting computer needed to be used for real-time access.

I took a cursory look at the code, and I couldn't help from laughing .. it was machine-generated basic that someone had hacked over. U G L Y.

I run the overnight batch and found what was taking over 8 hours - the transaction sort. What was amazing was that it was only about 500-750 records.

Ok, open up the sort program, break it up into separate lines, and try to decipher it. Guess what? BUBBLE SORT IN PLACE ON DISK.

Fortunately, each part of the batch was a separate program, so all I had to do was rewrite the sort in C. I knew I could sort the whole thing in ram, so, quicksort was invoked. That 8 hour sort was reduced to about 45 seconds.

Late that night, I got a phone call at home from the night manager:

nm: "The batch is broken"
me: "What do you mean?"
nm: "It stopped"
me: "What's the last message?"
nm: "Batch Processing Completed, total is $1234.56". 
me: "Is that total correct?"
nm: "yes"
me: "So, what's the problem?"

Needless to say, it took some persuasion to convince them that it was correct. Later, as their business increased, I changed the sort to a sort/merge on demand when the size of the batch exceeded memory. But the operators were so spoiled that they couldn't take the extra few minutes of processing, so they ran the batches more often with fewer transactions. go figure.

+1. love it! Use of/Teaching of Bubble Sort is one of my pet peeves.
Mitch Wheat
+1  A: 

We have a binary object database (not SQL, rather a CAD/EDA application). Each object consists of two or more files (dont' ask why, it has to do with object typing, inter-object references and localized editing of the objects). Deep in the application framework is a utility function which can be given a list of objects (described by name and type) that will scan a destination directory to detect possible conflicts. Object constructors will use this routine to avoid creation in a directory where an object of the same name and type already resides.

We also have a translator from a flat binary archival format (a single file containing an arbitrary number of objects in an industry-standard representation) to our internal application representation.

One of our customers was complaining of the glacial translation performance they were experiencing with large designs. You can probably see where this is heading...

The translator would be invoked with the directive to translate a flat file containing, say, 10,000 objects into an empty directory. Each object would get constructed, and the constructor would call the foundation library's conflict checker. This would opendir(), then readdir() and perform semantic analysis on all files found (zero the first time) to recognize objects, and compare them with the objects in the input conflict list (in this case, the single new object being constructed). No conflict found, and the object was created, making two new files.

Time passes, and there are 100 objects on disk (200 files). Now a conflict check for each new object performs an opendir() and 200 readdir() calls... The nicely generalized conflict checking code blew up as more and more objects were created.

In this instance, the solution was to replace the hard-coded call to the conflict checking in the object constructor with a strategy class allowing the client to specialize the meaning of 'conflict'. Since the translation application had a much simpler test for conflict available to it, it didn't need to opendir()/readdir() and could do a single stat per object. Once this was implemented, performance was linear with the size of the dataset (sorry, no simplistic "it was one million times faster!").

Looking back over my notes for this particular problem, I realize that the eventual summary only captures the endgame. Several 'problems' were fixed before this root cause was realized. And months later, another problem was uncovered which had been aggravating this particular performance bug by generating unnecessary object filesets! So even a single performance problem is part of a tapestry when your codebase is large enough.

Don Wakefield
+3  A: 

Simple mistake; easy to overlook:

We run our application's database side-by-side with our ERP system DB. Because of the requirements (read: shoddy design), we often have to perform cross-database queries to join one or two tables together. No problem, right?

After an update, we started getting complaints about the performance of a new screen - it was taking upwards of three minutes to load a single record, in a list of about 15,000! We broke out the profiling tools and discovered that the problem did indeed lie in the database query. So, off we go to look at the procedure, which did work, albeit very, very slowly.

As it turns out, the procedure was held in the ERP database, and with the exception of one table, every other table (something like 15+) was coming from our app's database. Because of the massive amounts of data being joined together, the cross-db references were eating up gobs and gobs of execution time. Simply changing the database the procedure was located in shrank the time needed to about four or five seconds, much better given what we had.

Lieutenant Frost
+1  A: 

OK, I'm dating myself, but I once had a program on a 68000 spending all its time in this loop:

int i;
struct { ... } a[...];

for (i = 0; i < n; i++){
    .. something involving a[i] ...

I wanted to speed it up, so I ran it under an in-circuit-emulator (remember those?), and halted it manually to take samples of the call stack see link. It was spending all of its time in the 32-bit multiply routine. Why? There was no multiply in that loop. Looking up the call stack explained it. It was computing the address of a[i], and since i was declared int, which was 32-bits, it was doing a 32-bit multiply. Changing i to short caused it to use a multiply instruction, and the loop tripled in speed.

Mike Dunlavey
+2  A: 

I was hired as contractor to fix a poorly written application by some fresh from graduation developers. There are a slew of poor choices, from SQL that executes inside a code loop (from the results of another SQL result). But the two worst slowdowns were:

First Obvious Mistake

A report needed to grab all the transactions for the current day, so the SQL was written like so:

SELECT * FROM table WHERE date LIKE '2008-12-20%'

And it was dog slow. If you didn't see the mistake here, well what is happening is that the DB server must convert the internal date field into a human date so it can be casted as a string so that the SQL could then perform a like. A simple rewrite of the SQL like so turned a 60sec query into 1 second

SELECT * FROM table WHERE date >= '2008-12-20 00:00:00' AND date <= '2008-12-20 23:59:59'

Second Obvious Mistake

A database viewer written as a webapp would perform progressively slower the more rows there were, even though there were only 10 items in a page. There was a dropdown that lets the user jump to any page, but some tables were massive and had thousands of pages. Upon viewing the HTML source, I saw the SELECT drop down had a simple ONCLICK event that would load a url from the value of the select. The problem was, the value was the complete url, and while that may not sound like a problem, when the url is massive, as it was, it was a huge slowdown. About 90% of the HTML content was this repeating URL and the only difference was the &PAGE= value at the end. A simply rewrite of the SELECT to call a function, sending the page number, fixed it and the page rendered within 3 secs into of 10-15mins, you'll be amazed how slowly IE or FF performs on 1.5MB of HTML.

you beat me to it - i had to look at a report which was taking 10 min to run. the cause - the original developer didn't know about joins, so would loop thru a SQL result, and call another SQL statement for every row. i replaced the two queries with 1 join, went from 10 min to 4 secs.
+17  A: 

My company inherited an application from India contractors. More or less, the application was 130,000 lines of spaghetti code: 100% of the application logic embedded in the forms, apart from forms the app contained 5 classes which held the entire state of the app in public static variables, and a grand total of zero unit tests. You know, standard quality code you get from the lowest offshore bidder.

The application was notoriously slow. After investigating the code, I encountered large regions of code that appeared to do precisely nothing. One method I found manipulated a string in the following way:

  • Read a string from a textbox.
  • Replace all of the newlines with pipes instead. This was implemented by iterating through each char, adding each char to stringbuilder one by one.
  • Split the pipe delimited string into an array.
  • Iterate through each element in the array, appending each line to a stringbuilder.
  • Assigning the stringbuilder's text back to the textbox.

The string that went into the method was exactly the same string that came out. No transformation of the string had taken place at all. The transformation was spread out across 2 or 3 methods for a total of 30 lines of code.

Additionally, a search for an element in a collection required a linear search, regardless of more efficient methods. If you needed to find an item with two properties, you had one for loop nested in another; for three properties, you had three nested for loops. Each search was O(n^p), where n is the number of items in a list, and p is the number of properties to compare.

If I remember correctly, there were 3 programmers working unsupervised on the project for about a year, and together they pushed out 130,000 lines of buggy code. I never confirmed it, but I believe those programmers were required to meet some unreasonably high quota, such as 200 lines of code per day.

To make a long story short, my company took over the project and continued development on the application in house. The application was so bad that it wasn't even refactorable, it just had to be rewritten. Every checkin saw the sourcecode drop by 5, 10, or occcasionally 20 kb at a time. Simply be removing repetitive, redudant, and useless loops, we increased the applications speed 300%.

After 12 months of re-writing the application, the linecount dropped to 36,000 or so lines of code, and the app runs at about 5x the speed of the original.

I'm almost surprised, assuming the 200 line assumption was correct, that the vendor didn't throw a fit because the new app used less lines of code. I can see it now... "We paid you to remove stuff? We want our money back!"
Just the idea of giving programmers a quota for a number of lines per day is insane and indicates a total lack of understanding by their management what software development is about.
Love that "money back" comment.

Long time ago I was looking at a batch process that was taking over 24 hours to run. I tracked it down to an SQL query that was causing thousands of full table scans over very large tables. I rejigged the query so it used indexes instead and the program ran in 20 minutes flat.

The users thought I was a God - for all of 30 minutes until the next problem came up. It was a nice feeling while it lasted.

+2  A: 

Over a decade ago, I rewrote some code that was responsible for generating dns config files from an /etc/hosts file (evil, don't ask). The original code took about 2 minutes to run which was insane.

It turned out that for every line (total about 8k lines) in /etc/hosts, it was looping through looking for the right file to write.

Once I rewrote the whole thing in perl (original was in incomprehensible shell script), it took about 4 secs to rebuild everything.

The DNS admins could not believe it when I showed the rewritten code to them. They were resigned to waiting every time there was a hostname update which happened multiple times a day. Needless to say, they were big fans afterwards.

Jauder Ho
+2  A: 

When my program hits a certain algorithm, it slows down. I kept changing the variables and its still slow. I know its not supposed to be a bottleneck because I wrote it down and analyzed it on paper a few days after I started the performance tuning.

Out of frustration, I recompiled the whole program. It worked.

....I don't know if its funny or not.

+2  A: 

Around 1996 the company I worked for was trying to build VB GUI for it's legacy app. When the first customer went live on the new UI they found that it took over 2 minutes for the "launch pad" to open. This launch pad had no real functionality. It just showed a user-specific set of buttons that would invoke the screens that did the real work.

Since this was not detected until after deployment to the customer, there was a lot of pressure to get it fixed right away. I got drafted onto a "performance strike team" that was supposed to figure out what was wrong and get it fixed.

When we traced the network traffic we found that the main problem was that the launch pad was querying the server 21 times just to get the date/time. Other data such as the user's privs was also fetched repeatedly.

This happened mainly because the development team was so compartmentalized. There was no "architecture" to speak of, just a big pile of code. There were several functional-area groups which all contributed code to that piece, but they weren't talking to each other. Apparently no one was doing much in the way of testing or profiling either. The launch pad pretty much had to be rewritten from scratch.

Getting this fixed was a nightmare, esp. because the customer was screaming about wanting their money back ($100K+) and top-level execs were getting involved.

There were loads of other problems with that app, pretty much all tracing back to poor planning and worse communication. That app was a miserable failure. They sold a few licenses, but it was intended to be the future direction for the company. When I left 5 years later, most of the customer base was still running on dumb terminals or in terminal emulators.


Back in my DBA days for a smaller shop I helped the developers out too.

One guys loved the new SQL 2000 udfs concept to encapsulate and otherwise SQL into the modern OO era.

He gave me a query (we're talking 20k rows over 3 narrow tables) to do some aggregations using scalar udfs, each with table access. Per key, per sub-key, then add them etc etc.

He reckoned we needed to buy new server to get it running under a day.

I removed all UDFs and it ran in one second...