ansaurus

Question

Biggest performance improvement you've had with the smallest change?

Answer 1

+12 A:

Replacing a "MUL" with a "SHL"/"ADD" series in some x86 graphical code also resulted in about an order of magnitude improvement.

Brian Knoblauch 2009-02-13 13:09:41

That must have been a long time ago…

Bombe 2009-02-13 13:21:26

8088 10mhz clone with an ATI EGAWonder800+ video card! :-) A *very* long time ago...

Brian Knoblauch 2009-02-13 14:37:24

*sigh* them were the days.

Justicle 2009-09-04 06:05:23

Answer 2

+69 A:

Enabling gzip compression for a dynamic web page. Uncompressed page had more than 100k ... compressed only about 15k. It felt so fast afterwards :-)

Peter Štibraný 2009-02-13 13:10:14

Just out of curiosity, what's the cpu impact of enabling gzip? I really have no idea.

TM 2009-02-15 00:00:43

Peter Štibraný 2009-02-15 06:41:57

I had something like this. We used to render a huge table (15x10x40 cells, or something like that) using Javascript. Back in the day, JS used to be sloooow.I suggested rendering the table server-side as HTML, use mod_gzip. It was much, much faster...

alex 2009-02-15 12:02:10

@alex, you rendered cubic table in HTML? Wow... :)

Constantin 2009-02-18 14:26:56

@Constantin: the table showed 15 days, with 10 or so columns for each day and 40 rows... :-p

alex 2009-02-28 10:38:10

And you used HTML?

Sneakyness 2009-07-27 15:44:42

I've seen this take a page from 6.5 megs down to 500kb. Holy $#%@.

rooskie 2009-07-27 16:21:19

I created an HTML Whitespace removing filter to use in addition to a GZIP filter - reduces so much in the generated HTML's file size!

MetroidFan2002 2009-07-27 17:13:17

use DEFLATE instead. http://stackoverflow.com/questions/1574168/deflate-compression-browser-compatibility-and-advantages-over-gzip

David Murdoch 2010-02-16 13:29:46

Answer 3

+34 A:

Turned off ODBC logging on a production database (someone had turned it on and forgotten it) - got about a 1000x performance improvement!

anon 2009-02-13 13:11:28

ODBC logging is the worst idea ever. If someone needs their server logs in a DB they can use a parser later. :(

Jon Tackabury 2009-02-13 14:28:17

I agree, it should be set to run after the fact or on downtime in the middle of the night.

Sneakyness 2009-07-27 15:45:32

Answer 4

+32 A:

Truncate table BigTable.

Queries returned no records but it was faaaaaast!

Goran 2009-02-13 13:11:33

+1 LOL-Point for funny answer

Zaagmans 2009-02-13 13:24:19

Wasn't funny at the time :)

Goran 2009-02-13 13:25:10

Had a similar experience with an intern who used LIMIT 1 to achieve that result.

unbeknown 2009-02-13 14:04:54

Way faster than doing a delete.

Kibbee 2009-02-13 14:52:00

God, I thought you truncated Google's BigTable :P

Ionuț G. Stan 2009-02-13 16:03:20

i tried tis on my servr and my apps stopped wroking. Plzhelpkthx!!!

Joey Adams 2010-08-09 04:45:46

enter "rollback" and pray!

Goran 2010-08-09 14:56:09

Answer 5

+122 A:

In some old code I inherited from a coworker, I replaced string concatenations (+ operator) with StringBuilder (.NET). Execution time went from 10 minutes to 10 seconds.

Gerrie Schenck 2009-02-13 13:13:12

I seriously doubt that.

Bombe 2009-02-13 13:20:33

No really. It was a huge method with triple-nested for loops wich all appended strings to the main string. In the specific scenario there were thousands and thousands of strings to be concatenated.

Gerrie Schenck 2009-02-13 13:26:18

I had the same thing with a Base64 encoding/decoding class I did a *long* time ago in Java 1.0.2. Changed from String to StringBuilder was huge.

Moose 2009-02-13 14:24:52

This is real. With a lot of loops that use the + operator on strings, this will happen very, very easily. I had a program that looped a few thousand times fall from a couple of hours to 15 minutes once by doing this.

Chris 2009-02-13 15:11:37

I had the same experience. We had a process drop from ~30min to 5min after making a change like this.

Kevin Tighe 2009-02-13 15:37:58

Yep. It adds up.

Adam Jaskiewicz 2009-02-13 16:45:27

And I thought that "StringBuilder" was the very first performance thing ever, and everybody knew about it.

Anthony 2009-02-13 21:05:36

I'll take note of this...

Aaron 2009-02-14 12:20:24

I did that too, maybe 8 years before, in a software that manipulated a lots of XML. All XML -> String methods of this program used the basic concatenation (ie +). Moving to StringBuffer was a huge performance improvement...

romaintaz 2009-03-03 14:18:57

I've had a similar experience. With enough data, it's very possible to go from minutes to seconds.

Richard Hein 2009-06-02 22:11:31

I had a similar experience in Classic ASP, but that was changing string concatenation to Response.Write.It makes a huge difference when the size of the string gets to be over a megabyte. 15 minutes -> 30 seconds.

aehiilrs 2009-07-27 14:55:58

I did this, also. Due to the ridiculousness of the original implementation, went from "Did Not Finish" (OOM exception) to about 1 minute. My second pass involved turning the StringBuilder into a filestream (it was ultimately being written to a file) that brought the time required down to a couple seconds.

Greg D 2009-07-27 16:22:43

If I'm not wrong: Bless javac that replaces looped string concatenations with StringBuilder at compile time. I think Java developers that don't know about the StringBuilder can remain oblivious to it.

Cecil Has a Name 2009-07-27 16:36:33

This is the entire reason StringBuilder exists - the performance issues that can arise from liberally using string concatenation in a tight loop where known way back in the early 32bit Delphi days. Almost any old school Delphi developer has implimented one of these at some point. .NET then came along and provided one right out of the box.

David 2009-07-27 17:18:42

It is a very common problem. Joel Spolsky gave it a name: Shlemiel the painter's algorithm. See http://www.joelonsoftware.com/articles/fog0000000319.html

Peter Mortensen 2009-09-02 21:59:19

@Cecil: you are both right and wrong. javac replaces concatenation with a StringBuilder, but only within the same expression, so it does not eliminate this kind of problem.

Michael Borgwardt 2009-09-02 23:00:13

Happens in PHP, too. I re-wrote an email attachment parsing routine to throw around indices instead of string fragments and the speedup was about two orders of magnitude.

staticsan 2009-09-02 23:07:21

@Cecil: Nice application of Curry's Paradox!

nikie 2009-09-07 20:58:35

here's a little benchmark i did for stringbuilder vs concatenation http://www.antiyes.com/stringbuilder-vs-concatenation

John Boker 2009-09-18 12:46:28

The "break even" point for switching to StringBuilder as opposed to a simple concatenation is somewhere around 8 to 12 concatenations, I believe. With more, you're better off with StringBuilder; with fewer you should use concatenate.

tylerl 2010-03-19 04:06:54

Remember that going from 10 minutes to 10 seconds sounds like a phenomenally exaggerated improvement, but really it's only a 60x speedup (less than two orders of magnitude even!) I've managed similar levels of improvement when changing a particularly dire naiive implementation to a much well thought-out one.

Coxy 2010-05-18 08:47:31

Answer 6

+7 A:

My biggest performance improvement was gzipping a 700 Kb XML file downloaded by thousands of clients a day and then caching the gzipped output in memory, dropped bandwidth usage somewhat but more importantly dropped server load from about 0.7 to 0.00.

weazl 2009-02-13 13:13:51

Answer 7

+57 A:

Turning off disk compression on a database server. Even accounting for the time taken to slap the sysadm, this was a huge net benefit :-)

Bob Moore 2009-02-13 13:13:54

+1 for slapping the admin

Matthew Whited 2009-05-26 19:19:06

Thankfully Microsoft SQL Management Studio 2005 will actually refuse to let you mount a database on a compressed NTFS volume/folder last time I checked.

David 2009-07-27 17:23:19

@David: They may have compressed it later... Plus people tend to dismiss warnings without reading them.

tsilb 2009-09-02 23:20:58

Answer 8

+1 A:

Cache locality

EDIT:

Harsh guys...

I switched out an object graph for a linear memory representation where cache misses basically went away. With perfetching and some C++ template tricks I could define a nicely laid out memory representation which the CPU would crunch in no time at all.

This optimization wasn't really that much work but it signifies how horrible poor memory access patterns can be and God forbid, reference types...

John Leidegren 2009-02-13 13:14:00

Would you mind elaborating?

Alex Angas 2009-02-13 14:08:24

I think it's pretty obvious what John meant.

kubi 2009-02-13 22:00:42

Obvious what he meant, but the request was for examples of dramatic effects, not things to think about when optimizing. If John would like to post about the time he rolled up a loop and got a big performance boost, that would belong here.

David Thornley 2009-02-13 22:42:31

Upvoted to counter Harsh Guys, who surely won't return to take back their petty downvotes.

Constantin 2009-02-18 13:38:42

Answer 9

+3 A:

Set NOCOUNT on a complex cursor based stored procedure.

It was returning a row count of 1 a few million times even though the application had no need to know it.

The gain was purely in network I/O.

Robin Day 2009-02-13 13:14:14

Answer 10

+24 A:

When maintaining someone else's code, I encountered a stored procedure that was taking approximately 4-5 seconds to run and producing a result with only a few rows. After examining the query in the stored procedure and the table that the query was running against, there was a distinct lack of indexes on the table. Adding just a single index improved that stored procedure from 4-5 seconds to about 0.2 seconds! Since this query was being run many times, it was a big improvement overall!

CraigTP 2009-02-13 13:15:04

If an index is added, then the write performance might suffer. Did you measure the write performance?

portoalet 2010-03-15 02:18:59

In typical usage, this application had about 1 write for every 1000 reads, so the overall performance of the application was drastically increased.

CraigTP 2010-03-19 08:58:08

Answer 11

+1 A:

Updating the database statistics on Oracle 9 using DBMS_STATS.GATHER_DATABASE_STATS reduced the runtime of a (rather simple) query from around 12 minutes to 200 ms. Oracle 9 decided that multiple full table scans were a better approach than using the index because the statistics were broken.

Joachim Sauer 2009-02-13 13:25:32

Answer 12

+14 A:

Go from single-core to quad-core.

(Hey, you didn't strictly say programming related!)

strager 2009-02-13 13:28:22

That's a small change? How long did it take?

Michael Myers 2009-02-13 19:27:00

@mmyers, It took a simple swap of parts. Maybe an hour at most.

strager 2009-02-13 20:44:54

+1 funny. $250 and half an hour of your time is worth it.

tsilb 2009-09-02 23:24:25

that's like if you're adding some ram :P

Atmocreations 2009-09-07 21:05:57

even 2 cores on windows can help ALOT since some programs take 100% of available CPU and you can only restart computer. If you have two cores you can actually press CTRL-ALT-DEL and kill problematic process, since other core is available for windows...

ante.sabo 2009-09-18 12:44:30

@as That is the whole reason I got my first dual core when they just came out

PeteT 2009-12-18 11:03:03

Answer 13

+84 A:

This is the same answer as I gave here:

I was working at Enron UK on a power trading application that had a 2-minute start-up time. This slowness was really annoying the traders using the application, to the point where they were threatening dire retribution if the problem wasn’t fixed. So I decided to explore the issue by using a third-party profiler to look in detail at the start-up performance.

After constructing call graphs and mapping the most expensive procedures, I found a single statement that was occupying no less than 50% of the start-up time! The two grid controls that formed the core of the application’s GUI were referenced by code that marked every other grid column in bold. There was one statement inside a loop that changed the font to bold, and this statement was the culprit. Although the line of code only took milliseconds to run, it was executed over 50,000 times. The original developer had used small volumes of data and hadn’t bothered to check whether the routine was being called redundantly. Over time, as the volume of data grew, the start-up times became slower and slower.

After changing the code so that the grid columns were set to bold only once, the application’s start-up time dropped by nearly a minute and the day was saved. The moral here is that it’s very easy to spend a lot of time tuning the wrong part of your program. It’s better to get significant portions of your application to work correctly and then use a good profiler to look at where the real speed bumps are hiding. Finally, when your whole application is up and running correctly, use the profiler again to discover any remaining performance issues caused by your system integration.

RoadWarrior 2009-02-13 13:28:38

+1 for the profiler plug... Until you use such tools, you are only guessing at the potential bottlenecks.

DGM 2009-04-14 15:40:38

It must be comforting to know how much Enron benefited from your code improvement. ;)

Chris Lutz 2009-05-26 19:22:02

Out of curiosity, what profiler did you use?

Kyralessa 2009-09-02 22:59:14

@Kyralessa: It was the Compuware one - this was a while ago. Nowadays I use the Ants profiler.

RoadWarrior 2009-09-03 08:53:13

@Chris: Enron Europe was quite profitable, especially after electricity de-regulation in the UK. But then the US cut all funding...

RoadWarrior 2009-09-03 08:55:28

Had ones of those - worse it was n^2 with the number of rows. Somehow the toolkit rescanned the table from the top for each new formatting element added.

Martin Beckett 2009-12-12 03:21:32

Answer 14

+176 A:

The most chat-addicted guy in the room took a day off.

User 2009-02-13 13:30:49

I find it objectionable that in many cases people upvote an answer just because it's funny - In this case, "funny but true". Therefore the more technical and insightful answers get buried. Having said that, +1 for "funny but true".

tsilb 2009-09-02 23:13:57

I've already tried a uservoice issue with "techincal rep" and general rep where people could distinguish what they are giving, thus allowing for funny answers as well as truly difficult questions.

Spence 2009-09-02 23:35:31

@Spence: Hi there, Slashdot 2.0. :-) Let's not make the system more complicated. Over time, the more technically astute answers will get up-voted. I'm not sure I like how questions can't be upvoted after a period of time...

Chris Kaminski 2009-09-03 17:41:14

Hang a sign outside your cube/office that says "NO LOITERING". It actually worked for me =)

StingyJack 2009-09-18 12:10:22

Isn't this a **productivity** improvement rather than a **performance** improvement, though? Or did the chat-addicted guy somehow slow down the application by his very presence?

Dan Tao 2009-11-23 15:50:54

+ for funny but true lol!

LnDCobra 2010-03-01 14:21:22

I won't work at such a place.

Amit Kumar 2010-03-08 17:18:31

if I had 100 reputation i'd vote this down as "funny but not really an answer". guess i better get busy here and make me some points huh?

msulis 2010-05-16 19:47:20

Answer 15

+79 A:

Add an index on a field of a table used for a complex SQL query. You can sometimes easily improve the performance by 90% or so.

Johannes Weiß 2009-02-13 13:36:15

This article: http://msdn.microsoft.com/en-us/magazine/cc135978.aspx is the useful when identifying indexes that need to be created - get Sql Server to tell you what indexes it needed, but couldn't use!

Paul Suart 2009-03-20 15:17:47

I can't count the number of times I've used indexes to strip massive amounts of time off report queries... SQL going from tens of minutes to a few seconds...

Damovisa 2009-09-02 23:33:11

Or, **not** adding an index to a table that is constantly being written to.

Evan Plaice 2010-06-24 03:07:50

Answer 16

A:

I refactored a SQL query that was running as a batch job. It had several functions in it that were horribly inefficient, and the query itself was poorly written.

After spending a few days rewriting it, the run time went from 13.5 hours to 1.5 hours. I have still not been able to beat that efficiency increase to this day.

Jay S 2009-02-13 13:37:41

Answer 17

+18 A:

Removing some rogue sleep()'s in some Java code.

Robert Gould 2009-02-13 13:38:23

WTF?? Why were there sleeps there?!

JoelFan 2009-02-13 17:59:18

Probably to "fix" race conditions. I enjoyed cleaning up after another developer after they stacked sync locks. They used upwards of 5 locks nested in the same method instead of fixing the real problem. In trying to create a singleton (which really should of been an instance but I won't even get into that) he didn't declare the lock object as static. This caused every call to get it's own lock and the nested locked just slowed the code down enough to stop the 3 two 5 requests from having problems.

Matthew Whited 2009-05-26 19:18:27

@Matt: +1 for misuse of Singletons.

tsilb 2009-09-02 23:23:04

Answer 18

+52 A:

A one-character change yielded an infinite speedup:

int done = 0;
while(!done);
{
    doSomething();
    done = areWeDoneYet();
}

Guess what the change was...

Remy Blank 2009-02-13 13:42:51

int done = 1; cute.

rtperson 2009-02-13 13:48:12

@rtperson: wrong. Look at the end of the while() line.

Graeme Perrow 2009-02-13 13:50:20

semicolon is misplaced.

ryeguy 2009-02-13 14:22:33

did you use a for loop? .. just kidding

2009-02-13 14:50:24

good argument for why you should do while(x) { instead of newline before {

Joe Philllips 2009-02-13 15:33:04

@d03boy: I fail to see how having the open brace on the same line would avoid the bug. If anything, it would obscure it further by hiding it amongst more symbols.

rmeador 2009-02-13 15:34:01

I find with the "{" on the same line that I'm a LOT less likely to accidently type a ";". So, that advice really does work in practice.

Brian Knoblauch 2009-02-13 15:58:12

I have to admit that my current coding style is while(x) { (it wasn't at the time) and I deliberately used the other convention here.

Remy Blank 2009-02-13 16:33:46

Too bad the compiler couldn't optimize away that empty but infinite loop. :)

Eddie 2009-02-13 20:32:32

@rmeador: True, it's no easier to fix when reading the code, but it's definitely easier to avoid in the first place. Your muscle memory is so used to typing semicolon-enter all the time. Typing semicolon-openbrace-enter would feel extremely awkward, you would notice right away.

Adam Bellaire 2009-02-13 21:44:02

@Eddie: he he :-)

Al pacino 2009-02-15 06:29:51

It is trivial for the compiler to detect an empty statement attached to a loop/'if' followed by a block statement that is not attached to a loop/'if'. The compiler should have warned you.

Tim Matthews 2009-08-17 14:05:45

I'm always extra suspicious of While loops... Maybe it's just my nature, but I tend to use For loops or replace them with "while (!(doSomething() ". Both return bool; doSomething can just return true for no reason or true meaning success... I'm paranoid that way.

tsilb 2009-09-02 23:19:57

You did remove the ! ?

Stephan Eggermont 2009-09-18 12:11:00

just TODAY that happened to us! F*king semicolon!

ante.sabo 2009-09-18 12:36:36

That is not a performance problem, that's a bug.

Tim Büthe 2009-11-26 08:43:53

Haha, I didn't see that one right away!

Jake Petroules 2010-06-09 21:39:52

One more reason why I like Python

Evan Plaice 2010-06-24 03:10:10

Answer 19

+10 A:

One project I worked on had a very long build time - over half an hour for a full rebuild. After a bit of investigation I traced it down to the precompiled header settings. I then wrote a small app to scan all the source files and reduce the header file dependencies and correctly set up the precompiled headers. Afterwards, full rebuild time was less than a minute.

Skizz

Skizz 2009-02-13 13:43:54

Impressive :) How many files? How many projections/solutions and which platform?

Ketan 2009-02-13 22:28:15

A hundred or so files, Windows. The problem was, the compiler was rebuilding the PCH file for every source file.

Skizz 2009-02-15 10:18:01

Answer 20

+9 A:

Changed a SQL query from a cursor to a set based solution.

ScottStonehouse 2009-02-13 13:45:24

Treating databases like arrays is one of the biggest mistakes I see. It's also one of my biggest pet peeves. While cursors aren't always bad, they should only be used when absolutely necessary, which is almost never.

Eric 2009-05-26 19:31:18

You can always tell the people who came from a DB2 or Oracle background because they go for a cursor-based solution prior to considering standard CRUD operations.

tsilb 2009-09-02 23:25:27

@tsilb - I see it mostly in application developers who work with SQL after only working in their app code. "Use what you know"

StingyJack 2009-09-18 12:16:55

Answer 21

+20 A:

I was writing a Java MergeSort, just to experiment and see how much of my old Data Structures course I could still put into practice. My first time around I implemented my merge routine with ArrayLists, and set it to sort all the words in War and Peace. It took five minutes.

The second time I changed from using the Collection classes to simple arrays. Suddenly the time to sort over 500K words dropped to less than two seconds.

This hammered home to me just how expensive object instantiation can be, especially when you're creating a lot of objects. Now when I'm troubleshooting for performance, one of the first things I check for is whether objects are being instantiated within a loop. It's much cheaper to reinitialize an existing object than it is to create a new one.

rtperson 2009-02-13 13:47:03

Current Java implementations are *much* better at object instantiation.

Eddie 2009-02-13 20:33:42

With 1.5+, is it still better to re-initialize?

cdmckay 2009-02-14 00:07:33

Peter Norvig elaborated on this for Java collections once. See http://www.norvig.com/java-iaq.html#slow

quark 2009-02-14 10:19:01

@Eddie, I was using Java 1.6. I'm sure instantiation has gotten faster, but there is still a significant amount of overhead involved in object creation.

rtperson 2009-02-16 13:47:50

There were two larger lessons to be learned there too: you're unlikely to beat the Java standard library implement on your pass at solving any problem it already covers, and you should never presume you can improve how something performs without profiling it first.

Greg Smith 2009-06-23 05:05:52

Answer 22

+4 A:

I inherited a time tracking application that was written in VB 3 and used an Access database. It was the first VB application written by a very experienced COBOL programmer. Rather than using SQL and letting the database engine get the data he wanted efficiently he opened the table and went from record to record testing each one to find the one he wanted. This worked okay for a while, but when the table grew to 300,000 records it got a "little slow". Looking for a single programmers time entries would take about 5 minutes. I replaced his code with a really simple SQL statement and the same search went down to about 10 seconds. The original programmer thought I was a god.

Jim Blizard 2009-02-13 13:49:19

But still, 10 seconds to query a 300K table? Even Access should be able to handle that in milliseconds.

Juliet 2009-02-14 20:58:12

Not too surprising if it was on a network share

Matthew Whited 2009-05-26 19:22:16

Yes, it was on a network share on a very busy token ring network.

Jim Blizard 2009-06-12 16:32:21

Answer 23

+41 A:

Just recently I did a Project Euler problem. I used a Python list to look up already computed values. The program took maybe 25 to 30 minutes to run (I didn't measure it). The lookup has to iterate through all values until it finds a matching one in the list. Then I changed the list to a set which basically does a hash lookup. Now the program runs in 15 seconds. The change was simply to put set() around the list.

Moral: choose the right data structure!

unbeknown 2009-02-13 13:51:22

This is basically the moral of every Project Euler problem.

Eric 2009-07-27 14:57:02

Answer 24

+3 A:

Installed profiler on the application server. It makes the plumbing work much more fun.

LiorH 2009-02-13 13:52:49

Answer 25

A:

The best thing I ever did was learn NHibernate and incorporate it into all my projects. My SQL is now always properly formed, and I don't have bottlenecks from that end of the project.

--And properly indexed tables that perform a lot of lookups!

Mark Struzinski 2009-02-13 13:56:25

Answer 26

+2 A:

Until recently we had an intern who had a special method of optimization. He put together a sql statement that took over 20 minutes to run and had to be called quite often. He became aware that the sql statement would finish real fast when he put a LIMIT 1 at the end. I think I destroyed his faith in humanity when I told him that this will not return the results he needs.

unbeknown 2009-02-13 14:00:44

Answer 27

+4 A:

On a 68000, some years ago, in this C code:

 struct {
 ...
} A[1000];

...

int i;
for (i = 0; i < 1000; i++){
   ... A[i] ...
}

One very small change caused a 3-times speedup. What was it?

Hint: sampling the call stack a few times showed the program counter in the integer-multiply-subroutine being called in the code from A[i].

Mike Dunlavey 2009-02-13 14:01:39

wow... my guess is that the 68000 didn't have a register capable of holding a value of 1000 (or perhaps just the memory address pointed to by it?) and thus you switched to using pointer arithmetic to iterate over the array to avoid the multiply... how far off am I? :D

rmeador 2009-02-13 15:37:03

@meador: I should have said the code actually allows i to skip around. Pointer indexing would actually be even faster for this code.You're in the ballpark. Let's see if there are any more guesses for the small change.

Mike Dunlavey 2009-02-13 18:05:12

Oh, all right. Just declare i as short. That allows it to use the 16-bit multiply instruction, rather than the subroutine.

Mike Dunlavey 2009-02-13 18:32:07

I've got an agenda: to show how useful stack-sampling is. If you didn't know that the multiply routine was being called from A[i], you'd be left guessing "why are we multiplying so much?" And it wouldn't help to know the whole loop was taking a lot of time.

Mike Dunlavey 2009-02-13 18:39:29

Eddie 2009-02-13 20:35:22

Oh, really, changing i from int to short made that much of a difference? I would never have guessed, but that makes sense. Especially on the 68000 (and other architectures from that era) where an integer multiply took a long time.

Eddie 2009-02-13 20:37:45

Right. Even if a 16-bit multiply instruction is slow, it's a lot faster than a subroutine to do 32-bit multiply. If you like that, they used to do floating-point with libraries also. 300 instructions to do an Add.

Mike Dunlavey 2009-02-13 21:02:17

Using a pointer would speed that up, but failing that, counting back to 0 (comparing to 0 is very cheap) would be the best optimization.

Hooked 2009-07-27 16:32:43

@Hooked: You're right. Unrolling would make it go even faster. The question was what's the biggest improvement for smallest change. Also, I think the best lesson to draw from this is not how the problem was fixed, but how it was found.

Mike Dunlavey 2009-07-29 13:28:11

Answer 28

A:

In a game application, I had an immutable class representing a cell in the game area's grid. It had getter methods which calculated the corners of the cell lazily, which included allocating new objects to represent the coordinates. The profiler showed those getters to be the bottleneck in the AI algorithms. Calculating them eagerly in the class's constructor improved the performance very much (I don't remember the exact numbers, maybe more than doubled the speed).

Before the code was like this:

public Point[] allPoints() {
    return new Point[]{center(), topRight(), topLeft(), bottomLeft(), bottomRight()};
}

public Point center() {
    return new Point(x + inner(width) / 2, y + inner(height) / 2);
}

public Point topLeft() {
    return new Point(x, y);
}

public Point topRight() {
    return new Point(x + inner(width), y);
}
...

The allPoints() method was the bottleneck. And after optimizing, the creation of all those values was moved to the constructor and stored as instance variables, after which all the getters were trivial.

It's always best to first do the simplest thing that could possibly work, and change it to something more complex only when there is evidence that the simplest thing is not good enough.

Esko Luontola 2009-02-13 14:16:34

Were you not persisting the results after lazily loading them the first time?

Simucal 2009-02-13 19:34:29

No. I created them with 'new' always when the method was called. It was the simplest thing that could possibly work, and only after the profiler showed that to be the bottleneck did I optimize it.

Esko Luontola 2009-02-13 20:07:41

I edited the post to show how it was.

Esko Luontola 2009-02-13 20:17:58

Answer 29

+7 A:

Changing log4net logger level from "DEBUG" to "INFO".

Si 2009-02-13 14:21:25

Interesting SO issue, adding a full stop equates to 60% contribution.

Si 2009-12-14 13:17:38

Answer 30

+3 A:

Switched from PHP to Python for pet projects.

2009-02-13 14:50:42

-1, quite subjective, not really valuable, right?

Carl Hörberg 2009-07-27 19:10:31

Answer 31

+1 A:

Updating the stats on a MS SQL Server database gave a 90x performance increase on certain queries, i.e. 90 minutes to 1 minute.

ck 2009-02-13 14:52:10

Woah, which version of MSSQL - that's just pathetic and awesome at the same time!

EnocNRoll 2009-02-13 16:19:38

2005 - the DB had been messed up a bit with running out of space etc.

ck 2009-02-13 16:27:31

My wife had a case where destroying stats helped - the table got real big for a couple of hours, then shrank down, and they were running the stats at night when it was tiny.

David Thornley 2009-02-13 22:44:12

Answer 32

+2 A:

A long time ago, I removed an index, and sped up my query by a factor of at least 300. I never did figure out why Oracle 7 figured it needed to do a full Cartesian join if it had the index, and not if it didn't.

David Thornley 2009-02-13 14:57:17

Answer 33

+4 A:

I swapped around the order of a selection criteria for a database query once and the runtime went from a 6 or so hours to a few seconds! The customer was pretty happy!!

revs 2009-02-13 15:01:59

Answer 34

A:

Learn to cache bitmap objects in .NET. The bitmaps were generated on the fly but many could be reused instead of regenerated. From an unusable app went to a pretty performant one.

Alejandro Mezcua 2009-02-13 15:05:25

Did you cache them in memory or on disk? I added support to my photogallery to use isolated storage for resized images and havn't looked back (15K+ x 4MB jpegs take a while resize for thumbnails, down scales, and even recompressed images) Also changing from an HTTPHandler webservice to a WCF restful service allowed for better cacheing of images on the client

Matthew Whited 2009-05-26 19:35:43

For that particular case bitmaps were cached on memory, for a custom .NET WinForms control...

Alejandro Mezcua 2009-05-28 08:41:26

Answer 35

+11 A:

After profiling showed that a large amount of time as being spent in std::map<>::find(), I looked at the key space and found that it was pretty much contiguous and uniform. I replaced the map with a simple array, which reduced the time required by about 80%.

Choosing appropriate data structures and algorithms is the best first step to improving performance.

mch 2009-02-13 15:11:39

I used to think hashing would have good performance when the key space is contiguous and uniform, probably near to that of an array (with a good optimizing compiler). What makes the map so worse than an array? Is it that the hash was being computed every time the key was used?

Hosam Aly 2009-02-13 22:08:57

`std::map` isn't a hash, it's a binary tree. so there was a binary search every time he called `find`. Replacing that with a straight array lookup would be miles better.

Sol 2009-02-14 17:34:54

Answer 36

A:

Turned off automatic row/column resizing on a DataGridView. Due to the way our app was written by another developer, the cell formatting would cause some checkbox column's value to be repopulated, causing the entire grid to recalculate it's size every time that column was painted. Clicking a button to add a row to the table took an exponential amount of time. Around 12 seconds to add a row by the time it got to the fourth row.

I turned the AutoRowSize off for the grid, and everything was almost instantaneous, as it should be.

Chris Doggett 2009-02-13 15:12:58

Answer 37

+5 A:

I recently rewrote a SQL query (for removing duplicates from a table), bringing the runtime down from still-not-finished after 47 hours to 30 seconds.

The trick: realising that it was an upgrade script, and I didn't need to worry about concurrency, since the database was in single-user mode. Thus, instead of removing duplicates from the table, I could just SELECT DISTINCT into a temporary table, TRUNCATE the first one and then move the rows back.

Roger Lipscombe 2009-02-13 15:30:15

was there a specific, important change that did the trick?

pc1oad1etter 2009-02-14 16:30:12

why were there duplicates in the first place?

renegadeMind 2009-02-17 14:28:16

I had something fairly similar but in a reporting database. I added a column to check off processed rows versus the original way of inserting a row into a second table. Instead of my WHERE IN growing for each row the list to check grew shorter.

Matthew Whited 2009-05-26 19:27:38

Answer 38

+20 A:

When updating WinForms controls realtime, simply doing something like

if (newValue != txtValue.Text)
   txtValue.Text = newValue;

instead of always doing

txtValue.Text = newValue;

took the CPU utilization from 40% down to almost nothing.

Jacob Adams 2009-02-13 15:37:03

+1 I had this problem with a TreeView updating routine which took an age to run!

mdresser 2009-04-14 15:53:01

I have tested this myself and found that doing the initial check is just as expensive in time as setting the property

benPearce 2009-06-23 04:58:11

@benPearce Is it an ordinary property or or something that causes the UI to repaint itself? The expensive part of this is that the UI keeps repainting itself even though it's that same data.

Jacob Adams 2009-06-24 15:49:07

@Jacbon: SuspendLayout() will stop it from painting. ResumeLayout() will cause it to start again. On some controls, however, this doesn't help much. Telerik controls were *horrid* in certain WinForm circumstances like this. 45+ seconds to load a form when with normal .NET controls it loaded in 5 seconds.

Nazadus 2009-07-27 15:43:11

TextBox already does it: if (value != base.Text) { base.Text = value; ... }

Ian Boyd 2009-07-27 17:20:54

@Nazadus, SuspendLayout and ResumeLayout only help if you are batching multiple control updates. If each control is being updated real-time, these calls would just add uneeded overhead.

Jacob Adams 2009-07-28 17:18:59

@Ianboyd, interesting find. It looks like Control also has this.

Jacob Adams 2009-07-28 17:30:06

Answer 39

+10 A:

Switch from the VS compiler to the Intel Compiler for some numeric routines. We saw a 60% speedup just by recompiling and adding a few flags. Utilizing OpenMP on the routine's for loops yielded a similarly large speedup.

Steve 2009-02-13 15:43:11

We did the same for some image analysis algorithms and saw a nice improvement as well.

Ed Swangren 2009-02-19 08:25:10

Answer 40

+97 A:

Changing a lot of logging to check log levels first.

From this:

log.debug("some" + big + "string of" + stuff.toString());

To this:

if (log.isDebugEnabled()) {
    log.debug("some" + big + "string of" + stuff.toString());
}

Made a HUGE impact on production performance. Even though log.debug() only logs when debug logging is enabled anyway, the string is built BEFORE it is passed to log.debug() as a parameter, so there was loads and loads of string building that got completely eliminated in production.

Especially considering that some of our toString() methods produced about 10 lines worth of info, by calling toString() on fields, which call toString() on their fields... and so on.

TM 2009-02-13 15:43:28

thats a good tip, thank you.

Orentet 2009-02-13 15:48:05

Yeah, string-building and formatting looks like just a 1-liner, so how bad could it be? It's easy to overlook that it exercises a major chunk of the run time library.

Mike Dunlavey 2009-02-13 18:59:55

For cleanness wouldn't it be better to add the log.isDebugEnabled() to the log.debug method?

Dscoduc 2009-02-13 20:52:59

The point is that the argument still has to be created before the method can be called. Deciding inside the method not to do anything is too late as the parameter work has already been done.

jackrabbit 2009-02-13 20:57:51

I usually create a utility wrapper method that takes a format string and builds the output using `String.Format()`, similar to `Console.WriteLine()` in .NET or `System.out.printf()` in Java. This would avoid most of the cost (except when `toString()` and similar methods are used explicitly).

Hosam Aly 2009-02-13 21:28:33

@Dscoduc log.debug already does that check on whether debug is enabled. So yes, we are checking a boolean twice (unless it's false). As @jackrabbit said, the issue is that the parameters are evaluated BEFORE they are sent into the function, so the real "work" isn't avoided.

TM 2009-02-13 21:36:29

same applies to printf and (I suppose) every other language. In C++ I tend to replace "log.debug" with "#define logit if (logging) log.debug"

gbjbaanb 2009-02-13 23:43:34

We use log4net and it specifically recommends this approach in the documentation

Richard Ev 2009-02-14 00:35:28

... better still, use slf4j's technique...

alex 2009-02-14 22:59:24

Do we really need to still give this tip ? Come on is everyone an amateur.

mP 2009-02-15 01:16:48

Everyone has to learn somewhere.

GMan 2009-06-23 03:56:22

In Dotnet you have the Conditional Attribute which the compiler will use to remove the method completely at compile time - see http://msdn.microsoft.com/en-us/library/aa664622%28VS.71%29.aspx

benPearce 2009-06-23 04:51:35

@mP - not everyone knows everything

benPearce 2009-06-23 04:54:01

This is a great reason to avoid littering code with debug statements. Since debugging is a cross-cutting concern, manage it from a single location, using aspects. This allows performant debugging by adding just one line of code to check the debugging state, as shown in the answer, which then applies to debug statements throughout the application.

Dave Jarvis 2009-07-27 19:54:52

I deal with this case by making the log function take a Func<String>. Delayed execution allows the debug check to be done in only one place.

Strilanc 2009-09-02 23:16:35

@benPearce that is a good idea, but does that allow you to alter the logging level at runtime? Changing log levels to figure out some weird issue can be very helpful in some cases.

TM 2009-09-02 23:35:54

Answer 41

A:

1) switching from in-house application with Expat parser, to XSLT and generic Sablotron, 100-fold improvement in speed and memory consumption

2) hacking Python code by calling directly objects properties, rather than setters/getters. 10-fold improvement in speed (although decreased code readability)

vartec 2009-02-13 15:44:50

Answer 42

+15 A:

Adding two indexes to a table speeded up a stored procedure from 12.5 hours to 5 minutes.
Moving a straight data copy operation from SQL's DTS to just a "insert into ... select from" statement reduced copy time from an hour to 4 minutes.

A more common example, however, was when a colleague had used sub-selects on SQL to get certain values from a child table. Worked fine on small datasets, but when the main table grew, the query would take minutes. Replacing the sub-selects with a join on a derived table made the whole thing much, much faster.

Essentially;

SELECT Name, 
       (count(*) from absences a where a.perid = person.perid) as Absencecount
FROM Person

is very bad, as SQL will have to do a new select statement for each row in Person. There are different ways of making the above more efficient but using a derived table can be a very efficient way.

SELECT Name, Absensecount  
FROM Person left join  
   (select perid, count(*) as Absencecount from absences group by perid) as a  
ON a.perid = person.perid

The problem with SQL is that it is very easy to write very bad SQL. SQL Server is so good at optimising stuff that most of the time you don't even realise you are writing bad code until it doesn't scale well. One of the golden rules that I always look for is; "Is my inner query referencing anything in the outer query"? If the answer is yes then you have a non-scaling query.

Frans 2009-02-13 15:51:13

SELECT p.Name, count(*)FROM Person pINNER JOIN absences a on a.perid = p.peridGROUP BY p.perid, p.Name

Carl Manaster 2009-05-26 19:33:02

@Carl, that's not equivalent. Consider a person with no absences.

David B 2009-11-19 13:40:57

Answer 43

+2 A:

Used some internal caching for a heavily used http module and the performance improved by a big factor.

Gulzar 2009-02-13 15:53:28

Answer 44

+1 A:

I was asked to troubleshoot an application which in production was completely pegging the CPU on the database server (SQL Server). After running a trace, it was evident that the table designer wasn't aware of something called a primary key (or any other indexes for that matter). I added the key live. All of the sudden, the clouds parted and the CPU % went down to reasonable levels for the amount of traffic.

jalbert 2009-02-13 15:58:00

Answer 45

A:

These tips can each make a huge difference:

Added the NOLOCK SQL hint to massively complex SQL.
Removed Order By's from nested subqueries within SQL.
Refactored SQL to avoid the need for the DISTINCT hint.

EnocNRoll 2009-02-13 16:17:33

Answer 46

A:

Improved the time it took to run Spring JUnit tests under Maven 1.1 by adding the following to the project.properties:

maven.junit.forkmode=once

This was a huge improvement because most of the tests were leveraging the SpringJUnit4ClassRunner, and by setting forkmode to once, the Spring context was only loaded once per Maven invocation instead of once per unit test invocation.

Steve Levine 2009-02-13 16:18:01

Answer 47

+1 A:

Removed the ORDER BY clauses from our SQL statements and moved sorting code to the Objects. This gives you a clean consistent query plan and moves the sorting work from the database to the clients (or web servers) where it's distributed.

Dan Howard 2009-02-13 16:21:29

Answer 48

+6 A:

In log4j on a server-side app, changing something like this:

log.debug("Stuff" + variable1 + " more stuff " + variable2);

to this:

if(log.isDebugEnabled())
    log.debug("Stuff" + variable1 + " more stuff " + variable2);

Gave us a 30% boost.

Don Branson 2009-02-13 16:21:30

This is already mentioned in another answer, with some interesting comments.

Hosam Aly 2009-02-13 22:12:19

ah, thanks. i didn't look hard enough.

Don Branson 2009-02-13 23:22:43

okay, i looked, and only saw the log4net comment. it's a little different - they got a boost by logging less, we got a boost by checking the debug level before building the parm list for the debug call. related, but not the same.

Don Branson 2009-02-13 23:24:43

Actually I meant TM's answer: http://stackoverflow.com/questions/545844/biggest-performance-improvement-youve-had-with-the-smallest-change/546409#546409

Hosam Aly 2009-02-14 19:10:47

okay, mine's *completely* different, because, uh, well, uh, his example uses curly braces. yeah, that's it. ;) Seriously, though, thanks for bringing it to my attention. It's the same.

Don Branson 2009-02-14 19:58:05

@Don Branson: I approve of this post :) +1

TM 2009-02-15 00:03:10

Thanks, TM. I plussed you, too.

Don Branson 2009-02-15 00:24:04

I cant believe people are so dumb they find this useful...

mP 2009-02-15 01:18:35

Because you always knew everything, right?

GMan 2009-06-23 03:58:41

SLF4J: log.debug("Stuff {} and more stuff {}", var1, var2); Both cleaner and faster.

Tim 2009-09-07 21:18:02

Answer 49

+2 A:

used

if( string.Compare( prevValue, nextValue, StringComparison.Ordinal ) != 0 )

Instead of

 if( prevValue == nextValue )

Binoj Antony 2009-02-13 17:01:03

I doubt this, at least in .NET 3.5. Some browsing in Reflector shows that `operator==` calls `Equals(a,b)`, which in turn calls `EqualsHelper`, and the latter does an ordinal comparison. The method you're using (equivalent to calling `string.CompareOrdinal`) does almost the same logic, ...

Hosam Aly 2009-02-13 22:21:04

... except that the compare method has to perform additional logic to return a negative or positive value (or 0).

Hosam Aly 2009-02-13 22:22:37

Answer 50

+4 A:

<%@ OutputCache Duration="3600" VaryByParam="none" %>

Binoj Antony 2009-02-13 17:04:49

Answer 51

+12 A:

Using a connection pool. Who would have guessed that something that is known to makes things faster actually does make things faster?

bogertron 2009-02-13 17:06:05

Answer 52

+3 A:

Took a web page load from 3 minutes to 3 seconds by indexing the primary search term. Problem was the table had 1,000,000+ rows. Their "developer" just couldn't make it go any faster and had them purchase a new Quad Server 8G RAM machine.

Harold 2009-02-13 18:38:55

Answer 53

A:

I added an index to a column in MySQL. This should have been there from the beginning, but everyone else overlooked it. I did some simple query explains, and found it. The main page of the site started loading 33% faster. Pretty nice for a quick index.

2009-02-13 19:07:29

Answer 54

+3 A:

While programming in CUDA for GPUS you must provide the correct number of threads to be launched. The program was launching with the incorrect number of threads, so it was running in serial. While chaning the line:

kernel <<< numberOfThreads >>> ()

to

kernel<<< numberOfThreads, numberOfThreads>>>()

the program ran ~ 500 times faster

Edison Gustavo Muenz 2009-02-13 19:24:31

Answer 55

+4 A:

A few projects back we were just short of reaching performance targets. I ran the profiler and found that sqrt() was occupying 42% of our frame time! I'm not sure why it was so slow on this hardware (Nintendo Wii), and it was only called a few hundred times per frame but wow.

I replaced it with a 3-iteration sqrt estimator and got almost all of that 42% back! (The estimation was "guess at a reasonable value of the sqrt, then refine by choosing the midpoint between that estimate and the result of dividing the estimate into the initial value." Picking a good initial guess was important, too.)

dash-tom-bang 2009-02-13 19:51:01

I think that's called the newton-raphson method.

DavidN 2009-02-18 21:45:13

It's also called the quake3-sqrt because it was found in the open-sourced quake3 code, except that the magician who did it there used only one iteration. :)

erjiang 2010-05-07 21:38:36

Answer 56

+2 A:

Call .Dispose for objects implementing IDisposable. There's a reason why those objects are implementing IDisposable, ya know!

The application (inherited from a former employee) went from needing a restart every day to running like a champ nonstop for the next 2 years.

Jeremy Frey 2009-02-13 20:48:39

Answer 57

+15 A:

Removed a html tag from a web application, gained 100% performance increase.

At some point I noticed that requests were duplicated. It took me some time to figure out it was caused by an empty image tag lost in sh*tload of HTML;

<img src="" />

For obvious reasons, Django's template system don't throw errors when a variable does not exists, so we didn't notice anything unusual when we inadvertently removed a template variable, which happened to contain an image src (for a small icon).

Removed the tag, the application loaded twice as fast.

2009-02-13 21:03:22

+1 for figuring that one out. -1 for using Django. You came out even :)

tsilb 2009-09-02 23:29:23

Stepping through a page with `<img src="" />` in the debugger can be quite confusing as well.

Matti Virkkunen 2010-05-18 10:03:14

Answer 58

A:

I wrote some code in work which was used to process large log files. It had to read each entry and match certain parts of it to previous entries. As you can imagine, the more entries were read, the more had to be searched to perform these matches. After quite a while of pulling my hair out, I realized I was able to make some assumptions on the entries which allowed me to store them in a hash table instead of a list. Now instead of needing to search each previous entry every time a new entry was read, it could simply do a hash table lookup.

Performance obviously jumped quite a bit. I believe for a particular log file, the list approach took about an hour and a half to process, while the hash table version took about 30 seconds.

Dan 2009-02-13 21:07:05

Answer 59

A:

Turning off Compiled flag for RegexOptions on Vista 64-bit.

Due to some strange bug with the .NET 2.0 Framework, Regex parsing is two orders of magnitude slower if the flag is turned on!

muerte 2009-02-13 21:11:33

Was it a bug, or where you creating new instances of the same regex repeatedly in your application? A buddy of mine experienced the same issue, until someone pointed out that he should create the regex *outside* of a loop, or else disaster will strike.

Juliet 2009-02-14 21:18:14

The regex was called just few hundred times. The problem is in Compiled flag. If we simply remove that flag, everything is drastically faster.

muerte 2009-02-18 21:26:27

I have a hard time buying this.

Ed Swangren 2009-02-19 08:01:39

Why don't you then look it up? http://blogs.msdn.com/bclteam/archive/2007/05/21/the-regexoptions-compiled-flag-and-slow-performance-on-64-bit-net-framework-2-0-josh-free.aspx

muerte 2009-02-19 09:28:12

Answer 60

A:

In a C# asp.net app, I moved some code that instantiated some xmlserializers to the global.asax application_start method.

There were 10 of these, and it dropped page load times by over 15 seconds each.

Chris Lively 2009-02-13 21:26:35

Answer 61

A:

The biggest gains will come from using the most appropriate datastructure. For example, I've seen huge improvement gains in C++ when switching an improper use of map<> to hash_map<>. The code was doing random lookups, and map<> is O(N), where hash_map<> lookups are O(1). The speedup was immediate and made the code many many times faster.

slacy 2009-02-13 22:11:30

map<> should be O(log n), not O(N). Still might be a significant improvement depending on the problem.

Mark Ransom 2009-09-17 03:57:34

Answer 62

+2 A:

Reduce remote calls such as database or web service calls. In most applications this is what produces most if not all of the latency, because it usually involves trips over the network.

IanL 2009-02-13 22:22:56

Answer 63

+1 A:

I took an old process that built a bunch of static HTML pages serially and mult-threaded it. Went from about 4 hours for 10,000-ish pages to about 30 minutes. Saved us from buying another server too. The change was basically to call the same getPage() function the same amount of times, but called it as a ThreadStart delegate.

I also had an instance where someone typo'd the mysql InnoDB memory setting to 01GB intstead of 10GB. Fixing that made a large difference (though admittedly it wasn't code).

WildJoe 2009-02-13 22:37:35

Answer 64

+1 A:

I reduced processing time on a CDR pre-processing filter from 30 minutes down to 4 seconds by replacing a split in perl with a regex on the fields I wanted -- and excluding all the trailing fields, which represented about 75% of each line.

So, instead of:

@array = split $line, ",";

I had:

($field1, $field2, ... $field8) = $line =~ /^(?:[^,]*,){5}([^,]*),(?: etc)/;

Daniel 2009-02-13 22:39:46

Answer 65

+1 A:

Changed a TSQL cursor to a set based query. Same result in seconds not minutes. Bonus from the boss that week :).

Spence 2009-02-13 22:44:24

Answer 66

+4 A:

I discovered code that build a string with an IN statement that was inserted in a WHERE clause of another SQL statement. Creating the string with the IN statement took about 15-20 seconds. The IN statement consisted of thousands of ids: the IN statement was splitted into several IN statements because Firebird can only take 1500 elements in one IN statement.

I removed the code and moved the SQL to get the ids to build the IN statement directly into the WHERE clause of the other statement. The size of that statement went down from more than 70.000 characters to only 1500 or so.

My main query was faster and I lost the time to build that IN statement.

Before:

SELECT id FROM TABLE_A A 
join TABLE_B B on B.A_ID = A.ID 
where B.ID IN (1, 2, 4, 5, ...1496 more) AND 
B.ID IN (2012, 2121, 2122, 2124,  ...1496 more) AND so on...

After:

SELECT id FROM TABLE_A A 
join TABLE_B B on B.A_ID = A.ID 
where B.FOO = 2

The_Fox 2009-02-13 22:51:48

*facepalm* .

Greg 2009-07-27 17:33:20

Answer 67

+4 A:

I knew a guy who was running some electron accelerator-related simulations that he wrote in C under Linux that took about a hour to complete on a Pentium-120 (it was a long time ago), during which he took lunch. I (mis)advised him to put gcc -O2 option in his Makefile, after which the program started taking several seconds and his nice excuse for lunch break was gone :) The secret was that the program had lots of nested loops in it, and most calculations were done in the innermost loop while for most of them it wasn't really necessary. gcc -O2 turned out to be smart enough to move these calculations outside of the loops, causing the unbelievable performance boost.

fionbio 2009-02-13 23:46:45

Answer 68

+4 A:

debug="true" to debug="false" in an ASP.NET web.config file

Kevin Pang 2009-02-13 23:55:04

Don't you mean the other way?

Ian Boyd 2009-07-27 17:23:38

Whoops. Yes, you are totally right. Fixed now.

Kevin Pang 2009-07-27 17:46:34

Answer 69

A:

Changing a bit of VB.NET code from looping and running the same SQL statement roughly 50 times and inserting a record each time to one SQL insert statement. 30 secs to 2 seconds.

The previous developer didn't seem to understand SQL (or much of anything), still it got me to a good start on the job.

PeteT 2009-02-14 00:02:44

Answer 70

A:

Putting the OutputCache attribute on a WebMethod

The WebMethod was loading Xml files and de-serializing the data to an object graph.

2009-02-14 00:19:21

Answer 71

A:

Doing some one-time processing of an XML file in Perl was taking minutes. Rewrote the routine in C# and it completed in seconds.

Richard Ev 2009-02-14 00:39:04

Rewriting an app is the *smallest* change you did to increase performance?

Constantin 2009-02-19 08:25:26

It was a tiny app, and a huge performance change.So the code change : performance change ratio was very high. :-)

Richard Ev 2009-02-19 09:20:37

Answer 72

A:

Usually when I fine tune an app, I find using stringbuilder for any heavy string work gives a huge performance boost.

TonyNeallon 2009-02-14 00:40:51

Answer 73

+3 A:

In C, I was writing a subroutine to slurp an entire file into one variable (bad practice, wastes a lot of memory, but it's the best solution and I only do it to one file). It used malloc() to create a 100 char array and realloc() to resize the array dynamically whenever it got full. I tested it on a 118448-byte file, and it took ten seconds to read it. I tried making it a 200 char array and increasing the size by 200 bytes, and it still took 10 seconds. Then I smacked myself and changed this:

if(size == strlen(string)) {

to this:

if(size == counter) { // counter is the index of the last char in the string

It now reads and processes the same file almost instantaneously.

EDIT: Fixed typo.

Chris Lutz 2009-02-14 00:45:55

was going to post something very similar, was trying to figure out why a co-workers code was so slow and I saw this:for (int ixChar=0; ixChar<strlen(reallyfreakinglongstrong); ixChar++)The *really* annoying thing is that it had already been spotted in a code review.

Andrew Barrett 2009-03-05 08:19:15

Increase your memory by amount proportional to the current size (e.g., `1.3*size`) instead of fixed amount (200 bytes in your case). It greatly diminishes number of costly `realloc()` calls.

J.F. Sebastian 2009-07-27 18:21:44

It would be faster, but it's not a performance issue right now. I may tweak it a little, but It Works On My Machine™. I also hear that using powers of 2 makes it faster.

Chris Lutz 2009-07-27 19:10:52

Answer 74

A:

The other day I found out how bad Postgres 8.1 is at optimising prepared statements.

I changed the code from SQL ?s to sprintf %s-es, and the query went from taking over 15 minutes to under 7 seconds.

(Then I installed 8.2 on a test box and found out they'd fixed that problem...)

Ant P. 2009-02-14 00:52:08

Answer 75

+10 A:

Indexed a database. Imagine driving a Daewoo Matiz that suddenly morphs into a Lamborghini.

Ryan Bigg 2009-02-14 00:52:17

Answer 76

+3 A:

Moved a function call outside of nested loops. I think that was about 10x improvement. Changed the switch from application server to database from 100Mbps to 1Gbps, this improved performance during high traffic.

2009-02-14 00:56:16

Answer 77

+2 A:

In an ASP.NET application there was a page which displayed a lot of records (order of 1000s) from a SQL database query.

Originally the app was storing results in a DataSet before sending the results to client. This was causing users to have to wait a long time to get the results, as well as causing scalability problems because the server was storing the entire result set in memory (DataSet) before returning it to the client. A long wait would also cause users to constantly hit refresh, worsening the problem.

I removed the DataSet and had the code stream out the query results using Response.Write, and this greatly improved the scalability of the server and the perceived performance from the user's perspective (since they were getting results streamed to them immediately).

DSO 2009-02-14 01:07:55

Code please ?

Ian Boyd 2009-07-27 17:28:37

Answer 78

A:

Changed from using a TreeSet to a HashSet. Was performing lots of set unions. ~40 seconds to ~200 ms.

Steve Brewer 2009-02-14 05:00:48

Answer 79

+2 A:

i had a program that was address checking tens of millions of addresses. it could do a few hundred per second but it still took the program about 4 days to finish each run. the problem was that it was doing one address at a time.

we made the program multi-threaded (didn't take much work at all) and had it use 5 threads.

the program went from taking a few days to complete to a few hours.

note: we were making calls to another program that would do the address check

2009-02-14 09:19:17

Answer 80

A:

Adding a compound index on a table. It reduce the time for a select query from 83 second to 2 seconds. Note that the SQL Wizard's hints weren't appropriate, I spent one day to think about the columns to be added/their order inside the index.

lmsasu 2009-02-14 10:04:10

Answer 81

A:

Quite similar to what you described in your question, I didn't trust SQL Server's optimizer, and added "OPTION(HASH JOIN)" to a query - over 3 orders of magnitude faster.

Moshe 2009-02-14 10:19:05

Answer 82

+2 A:

In a tight loop, i replaced the return value of a function from IEnumerable<int> to int[] and worked with for instead of foreach. This reduced garbage collection to a minimum and increased performance by factor 10.

Rauhotz 2009-02-14 11:28:22

Answer 83

+4 A:

I removed a Cartesian join from a query and the nightly job went from hours to seconds. Was tested in QA, but no one ever questioned that the job wasn't suppose to take hours to complete so they passed it.

Same company, with some simple re-indexing I took time sensitive nightly batch processing that was taking 6-8 hours to complete and got it completed in 1-2 hours.

Just yesterday I had a client add a few indexes to a few tables and reduced the run time of a procedure from 8 minutes to 6 minute. Not great mind you, but the tables are very large, and the procedure runs every 10 minutes. So over the corse of a day I saved the SQL Server 2 hours of processing.

mrdenny 2009-02-14 11:35:20

Answer 84

+1 A:

We had a huge multi-project Maven1 build structure that was just insane, over 200 project modules. Due to inter-dependencies, it was not even possible to do a full automated build- modules had to be "released" to the CM group manually, a process which sometimes took 2 days.

The first optimization was to convert from Maven1 to Ant+Ivy. This allowed automated builds, taking about 90 minutes for a full release.

The second optimization was to stop doing "scp artifact.jar remote-server:repository" manually for each artifact. I replaced that with a single call to rsync the whole structure up to the repository, which brought the whole build down to 5 minutes. And a totally automated 5 minutes at that. :-)

EDIT: After re-reading the question, I guess this doesn't really count as a "smallest change", but I'll leave it here and risk the down-voting.

Limbic System 2009-02-14 13:33:39

Answer 85

A:

I once tweaked a small tool for exporting master/detail customer data written in VB6/ADO on MS Access. Got a 60x performance improve (from 10 minutes to 10 seconds). It was working like this:

openConnection1
masterRS = getCustomers()

while not master.EOF
    openConnection2()
    openDetailRS
    ...
    closeDetailRS 
    closeConnection2

    master.MoveNext
wend

closeMasterRS
closeConnection1

Guess what the problem was... :-)

MicSim 2009-02-14 14:02:59

moved the opening/closing of connection2 outside the loop?

2009-02-14 17:32:00

100 points! For thousands of records that was a major performance hit.

MicSim 2009-02-14 17:55:54

Answer 86

A:

In terms of web app pages loading faster, we used a filter to strip out all excess white space from the html. This decreased the actual page size 25% which speeds things up quite a bit.

Reason we had so much white space was that there was a big JSP file involved that had lots of pretty printing. Pretty printing is a good thing but can increase your page size/load time in this scenario.

Marcus 2009-02-14 16:44:46

Answer 87

+4 A:

I strongly urge everybody else to do as I do: Do the improvement and forget about it the second you did it. Otherwise you will do premature optimizations in a subsequent project. ;) Always consult a profile before doing anything (e.g. the "always use stingbuilder" notion is usually not necessary - if not hurtful). Us the best readable thing. and worry about performance within one tier later on. Make it readable and correct (in that order) and then, maybe, make it faster.

Tobias Hertkorn 2009-02-14 17:26:24

Unfortunately some coders have no idea how to write efficient code. Saying to them that it's ok not to worry about it is worse than premature optimization.

kosoant 2010-01-07 15:04:34

Hmm, IMHO the coder you speak of will do even more damage when coding with performance in mind. Because he will do all his programming with the one "proven" performance improvement he "learnt" about while googling. The horror. I'd rather have slow, readable code which a good programmer can easily improve on.

Tobias Hertkorn 2010-02-08 15:47:26

Answer 88

A:

Removing

<%= javascript_include_tag :defaults %>

from a Rails app that didn't need it. Even if I needed the scripts, that line was a huge bottleneck. The app, by default, included the javascript files with a random number parameter attached to the end of the filename to prevent caching.

Fixing this dropped the page load time from 7.5 seconds to 1.5 seconds.

2009-02-14 19:24:23

Answer 89

A:

One time I had a JavaScript function run for about 45 seconds in IE. Chrome crunched it between 1-2 seconds.

Oh, that, and going from a Debug Build to Release Build... That was an eye opener.

James Jones 2009-02-14 19:35:52

Answer 90

+2 A:

Switched from using Linq to some older style array looping. :) cut the processing time on a particularly lengthy method nearly in half. (from 940ms to 501ms).

2009-02-14 20:08:22

Answer 91

A:

I had a state machine transition function which relied on a local std::stack for temporary values. The stack always emptied before the function returned, and the function didn't need to be re-entrant or thread-safe, so I could make it a static local variable.

This avoided re-allocating/growing the stack each time, resulting in something like a 10x performance improvement.

Henk 2009-02-15 01:02:14

Answer 92

A:

I originally used an ASP.NET DataGridView to display a large and richly formatted dataset which pushed the page beyond the 580k mark.

I later replaced the DataGridView (which is made up of tables by default), with a repeater control and a carefully 'cascading' arrangement of CSS styles. The change brought the size down to the 120k region.

Patrick Manderson 2009-02-16 11:26:08

Answer 93

+3 A:

When writing a solver for a game, adding very simple and limited dead-end recognition to prune the search tree brought down solving time for a big level from 15 minutes to near instantaneous.

Michael Borgwardt 2009-02-19 08:06:44

Answer 94

+1 A:

Rewriting a join:

First:

Select from a
  left join b on a.idb = b.id
  left join c on a.idc = c.id
  left join d on d.id = a.idb or d.id = a.idc

After:

Select from a
  left join b on a.idb = b.id
  left join c on a.idc = c.id
  left join d on 
    (case 
       when a.idb is not null then b.id
       when a.idc is not null then c.id
       else null
    end) = d.id

The query went from 3+ minutes to 8s after some more tweaking it eventually came down to about a second which was acceptable for this one.

Coentje 2009-02-19 08:16:34

Answer 95

A:

In the first week of work on my first job I was asked to make some fixes (mainly UI) for application that was used internally to monitor usage of our ATMs (the number of them was below one hundred). The initial load time was very annoying, it was about ten minutes. And I was needed to make many restarts of that application to test my fixes, so I decided to find the reason of such slow start up. Without usage of any profiler I found code that was very suspicious to me. There was some method which was used to build human-readable information based on the states of ATMs stored in local database.
Structure of queried table was something like:
atmId(actually many columns here), operationId, moneyInAtm(actually many columns here), time
The (operationId, time) pairs was unique in that table.
The idea of code was following:

Retrieve last (maximum value of time) rows for all operations in all ATMs for last two days.
For each of that rows do another query to find row with equal values of atmId and operationId and with minimal value of time.
Calculate the difference of money and add (atmId, operationId, diffMoney, time) row to some table stored in memory (It later will be shown to user).

I replaced first two steps in this procedure with something like:

Retrieve all data for last two days sorted by operationId and then by time.
Iterate over that result set and find first and last rows for the same operation. (That was pretty easy since the rows sorted)

After that change the application initialization time was reduced to some seconds and the users which used to always go for some tea or coffee during start-up just denied to believe that the program worked correctly with such fast start-up. There are, however, was found one regression after few weeks of usage. The data about operations which was started before range of my query was lost or in some cases corrupted, because I was missing first row of that operations, but that bug was fixed and the users was happy.

Dmitriy Matveev 2009-02-26 05:13:08

Answer 96

A:

Removed a delay() instruction from an old DOS game made by a friend, to make it work on a 286 system.

Quamis 2009-03-23 08:40:03

Answer 97

+1 A:

Have apache and not tomcat serve static resources
Use gzip compression
Minify, compress and stick together multiple .js files
Minify, compress and stick together multiple .css files
Add caching to resources

Loading of a web page went from 30s to 4s (first time) and to 0.5s (cached).

flybywire 2009-03-31 13:37:26

Answer 98

A:

The report-builder component of a piece of financial software my team built had a nasty little glitch: it read the field delimiter character out of the DB every time it was inserted. (DB was Oracle 8 -- nice and heavyweight.) Several months later, my boss asked me to take a look at the code and see if I could optimize it so that reports would finish faster than "overnight". I spotted this little oopsie and stored the delimiter in a local variable after the first read. Performance increased literally 100-fold.

The original coder was ordinarily very competent. Dude just had a brain cramp the day he coded that.

BlairHippo 2009-04-14 15:28:07

Answer 99

+1 A:

Rewriting a job with cursors on TSql to work with helper tables, a long ago, so, I don't have the code, but it improves from 2 hours to ten seconds

Jhonny D. Cano -Leftware- 2009-04-14 15:30:48

Answer 100

+15 A:

Letting go that developer who fondly and erroneously believed that demonstrating how clever you are is the same thing as getting work done.

Sometimes to improve the code -- improve the team.

Conrad 2009-04-14 15:32:21

+1 : I know how you feel.

Deepak Singh Rawat 2010-01-06 09:00:43

Answer 101

A:

It just happened right here

John Myczek 2009-04-14 15:37:57

Answer 102

+2 A:

I was porting a professor's C code for a Travelling Salesman genetic algorithm to Java. The majority of the work was moving from procedural to OO.

We were carrying about 1000 trial solutions, and killing off 100 each generation.

Each solution was simply an object which contained an array of nodes to visit (in order) and a couple of methods to get costs and manipulate the crossover.

First (successful) run took 8 hours -- I blew the memory a few times first.

Instead of de-referencing the objects, I stuck them in a pool for reuse and performance increased to about 5 minutes. after a few generations the Garbage collector was running constantly and I was spending more cycles cleaning up the mess than I was processing the data.

chris 2009-04-14 15:45:11

Answer 103

A:

I changed an oracle query from this:

SELECT DISTINCT ...
FROM ...
WHERE ...

To this:

SELECT ...
FROM ...
WHERE ...

This literally made a 1000-fold difference.

Jason Baker 2009-05-26 18:53:23

Answer 104

A:

My company hasn't always been so organized about backgrounding. For simple projects, we just run processes with the bash screen command. Typically, logging is set up with file and console appenders. Logging in to the host machine and detaching running screens once cut the time of a long series of calls by about a factor of four.

Of course, removing an errant sleep statement cut out about a factor of 10. I never figured out what it was doing there.

David Berger 2009-05-26 19:13:06

Answer 105

+3 A:

Once an application was having a TERRIBLE performance, it took about 15 secs to display a simple aspx with no complex logic. Three developers were tuning SQL statements, business logic and even the HTML in the page. I checked it out and resolved the issue by changing this attribute in main web.config:

debug="true" to debug="false"

Am I a genius? Hahaha, I'm really not!

victor hugo 2009-05-26 19:17:55

Answer 106

A:

I had a JavaScript based table sorter that would lock up the entire browser from 10 seconds to a minute on each run (pretty large data set). After profiling (and many complaints) I learned that the mootools adopt method was taking up 88% of that time. All it took was the addition of four letters and instantly I got a massive performance improvement (down to about 1.5 seconds per run, much more acceptable).

From:

this.body.adopt(rows);

To:

this.body.adopt.pass(rows);

tj111 2009-06-02 21:21:46

Answer 107

+1 A:

Added buffering to a FileOutputStream that was being written out 1 byte at a time. Took that step of processing down to 4 min from about 1.5 hours. Big difference considering this was for a security-sensitive app where an operator has to be present in the secure room for the duration of the step.

ykaganovich 2009-06-02 21:29:47

Answer 108

+1 A:

There was this stored procedure (Sybase T-SQL) used for reporting purposes, that used temporary table, that had basically this structure:

CREATE TABLE #temptable (
   position int not null
)

It was joined with other tables (integer to integer join) on the position field, but there were couple of tables, that had the position value declared as char field, this caused the index not to be used, so, the solution was to modify the #temptable structure to:

CREATE TABLE #temptable (
   position      int      not null,
   position_char char(12)     null  
)

And just after it was filled in, do the update:

UPDATE #temptable SET position_char = convert(char(12), position)

So, the joins are made without converting the values, plus having extended the index on this table to the additional field made things go much faster.

B0rG 2009-06-12 16:15:50

Answer 109

+3 A:

I changed a VB6 function that was concatenating hundreds of strings together to output a tree control in the early days of ASP. It called a function that looked like:

mystring = mystring + param1 + param2 + param3 + param4

Adding a single set of parentheses to change the order of concatenation:

mystring = mystring + (param1 + param2 + param3 + param4)

optimized the time the page took to load by over 99%. Went from over 2 minutes to under 1 second.

womp 2009-06-17 21:13:16

Answer 110

A:

I replaced a dynamically generated "or x='a' or x='b'..." to a dynamically generated "x in ('a', 'b'...)" and was able to make it run fast, before that the application was dying when executing that query

BlackTigerX 2009-06-17 21:16:42

What database? SQL Server turns "IN" into "OR"

Ian Boyd 2009-07-27 17:43:51

in was MSSQL, like I said, before the change, it would time out, after the change, it ran pretty quick

BlackTigerX 2009-07-30 21:41:45

Answer 111

+39 A:

void slow() {
  if( x % 16 ) {
  }
}

void fast() {
  if( x & 15 ) {
  }
}

Converting modulus of powers of two to an equivalent bitwise and operation moved a real-time MPEG-to-JPEG transcoder from producing B&W images to producing full colour JPEGs of a movie, with CPU cycles to spare.

Response to Optimization

To determine if a compiler performs an optimization, test it. People have said to me, "The compiler should optimize that." In theory, yes, it could. In practice, a compiler will only optimize code for scenarios that optimizing code has been written. Some optimizations are not as important than others.

Try It Yourself

For those who insist that the compiler should optimize this, just try it.

$ gcc --version
gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3

$ cat t.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j & 15;

      if( q ) {
        printf( "j X 15 = %d\n", q );
      }
    }
  }

  return 0;
}

$ gcc -O3 t.c
$ time ./a.out  > /dev/null

real    0m6.750s
user    0m6.732s
sys     0m0.016s

$ cat t2.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j % 16;

      if( q ) {
        printf( "j X 16 = %d\n", q );
      }
    }
  }

  return 0;
}

$ gcc -O3 t2.c
$ time ./a.out  > /dev/null

real    0m13.668s
user    0m13.633s
sys     0m0.040s

Using & instead of modulus for powers of two is not optimized by gcc. Feel free to share compilers that optimize this code, in practice.

See Also

Bit Twiddling Hacks

Dave Jarvis 2009-06-23 03:51:29

This sounds like something that the compiler should be handling

Graphics Noob 2009-09-03 17:52:39

nice call - those hardware classes *do* sometimes come in handy :)

warren 2009-09-04 06:24:05

Baffe Boyois 2010-03-09 05:15:48

You are correct, Baffe. The code I was working with originally had `int` variables, not `unsigned int`. The two different sources above result in nearly identical binaries when using `unsigned int`.

Dave Jarvis 2010-03-09 17:02:47

Answer 112

A:

ServicePointManager.DefaultConnectionLimit throttled the connection count between web and app to 4!

Hans Malherbe 2009-06-23 04:38:31

Answer 113

A:

I was writing a script that would write about 20k files to disk. It was taking about an hour, and I couldn't figure out why. Then I remembered that I was working on an NFS mount. Once I changed the output directory to /tmp and off the network, the script ran in about 2 minutes.

humble coffee 2009-06-23 04:50:23

Answer 114

+2 A:

Adding a few SQL Server nolock directives for static tables (prices that were updated once an year).

Marco van de Voort 2009-07-27 14:52:31

Answer 115

A:

I have to finish a VB app with crystal reports that connect to a database. The original programmer stored the data in the BD as: "Field name = X" when a check box in the VB app was checked and "Field name = " when unchecked. The crystal reports showed all those strings from the BD and the fields must had the correct number of spaces and be in the correct places or everything would be messed up. The format of the report can't be changed, but I had to changed... I made the app store "X" when the check box is checked and " " when unchecked. the rest is written in the crystal report. Now I can place the fields in the report anywhere I want and nothing gets messed.

yelinna 2009-07-27 15:24:55

Answer 116

A:

Renaming the daft long contentplaceholder id in an equally daft ASP.NET page that made use of master pages, nested user controls and nested repeaters. Saved about 30 KB from the rendered markup. Funny.

redsquare 2009-07-27 15:28:10

Answer 117

A:

For our thesis, my friend and me had to copy a LARGE series of data to an excel file. This data was generated from a Matlab script. Made by hand, this "copy to excel" task will take an entire day. Then I programmed two loops in that matlab script and made it to write the data in the excel file (programming this took me a couple of hours) and now this task takes half of an hour in my dual core toshiba laptop :D (30 minutes... I repeat: It was a LARGE series of data).

yelinna 2009-07-27 15:33:58

Answer 118

+2 A:

I once made a C program twice as fast by changing the array size to be a power of 2 and thereby avoid integer multiplication. In the center of my simulation code I had a 2d array named stored on the heap, Here are two ways to index into it:

#define worldState(x,y) (*(world + (y) * worldYSize + (x)))    
#define worldState(x, y) (*(world + ((y) << worldYSizeBits) + (x)))

On the 1995-era Sparc I was running this code on integer multiplication took 33 clock cycles; one cycle per bit in the word. The bit shift took 1 cycle. And by far the main thing my code was doing was fetching states out of the world, so I saved 50% of my runtime by constraining my code to only work on world sizes that were powers of 2.

I found it with a profiler; fortunately the multiplication showed up as a call to the function _imul() which the gcc runtime was providing. Compiling with -O would hide that, btw, but at the time the profiler didn't work with optimized code.

Nelson 2009-07-27 15:42:24

Answer 119

+2 A:

Well the server was taking forever to load, so long that it was timing out!

I plugged in the ethernet cable and everything loaded instantly. It was beautiful.

(Happened in a Network Administration class I was taking, teacher moved the server but forgot to plug everything back in.)

Sneakyness 2009-07-27 15:47:27

Answer 120

A:

Remembering to implement/use an active_flag column in your SQL table/queries to return only active rows, it's a big help when you have hundred thousands of rows.

Jreeter 2009-07-27 15:54:11

Answer 121

A:

I changed someone's CF code from this:

<cfloop from="1" to="#a_large_number#">
  <cfoutput><td width="1" bgcolor="#ff000"></td></cfoutput>
</cfloop>

(reading it I had a serious WTF moment)

to this:

<cfoutput>
  <td width="#a_large_number#" bgcolor="#ff000"></td>
</cfoutput>

(This was in 1999, hence the HTML style)

DanSingerman 2009-07-27 16:09:42

+1 for the memories!

UpTheCreek 2009-11-19 13:44:18

Answer 122

A:

I was checking for nodes in a tree that could be identical. I compared every one of the 3000-5000 nodes to every other node, and my full script took around 25 minutes to complete for every tree. I then realized that I only needed to check one category of nodes, which amounted to 300 nodes or so. After pruning the tree, the script took around 1.5 minutes. The power of O(n^2).

Nick Lewis 2009-07-27 16:26:58

Answer 123

A:

I don't remember when I've done my best performance improvement but I know what it was in C:

int a = 0;
while(a = 0)
{
    if(ShouldQuit())
     a++;
}

to

int a = 0;
while(a == 0)
{
    if(ShouldQuit())
     a++;
}

I can't say for sure what was the time improvement between these two versions, because every times I try, I TimeoutException...

Nicolas Dorier 2009-07-27 17:08:45

if you had done 0 = a from the beginning this would never have been an issue.

benPearce 2009-09-02 23:50:08

@benPearce: Which reads unnaturally, is inapplicable when both sides of the comparisons are lvalues, and is easy to forget to do. This is something the compiler should flag.

David Thornley 2010-03-01 14:53:20

Answer 124

+1 A:

It was a simple network simulator done as a homework assignment (in C#) and meant to run only once. However, when it ran it did so so slowly it would have taken over 24 hours to finish.

A rather quick glance at the code discovered that every simulation step recalculated the average of elements of a list. That list also grew at each step, thus landing a nice O(n^2) complexity. I changed the calculation by keeping the last average and using it to calculate the new, resulting in an O(n) complexity.

The total time decreased from an expected over 24H to about 15 minutes, about two orders of magintude.

Asaf R 2009-07-27 17:22:19

Answer 125

+3 A:

In the early days, I had some code that grabbed hundreds of rows out of a SQL database table based on a where clause. The whole purpose of this code was to get the number of rows returned.

After learning that I can get the number of rows from a given query with the COUNT(*) statement in SQL, I drastically improved performance of that page.

Steve Wortham 2009-07-27 17:32:19

Answer 126

+2 A:

Linq to SQL don't cache, so use ToList() when you are enumerating a IQuerable<> multiple times.

var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a);
doThingWithDataManyTimes(query);
doThingWithDataEvenMoreManyTimes(query);

to

var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a).ToList();
doThingWithDataManyTimes(query);
doThingWithDataEvenMoreManyTimes(query);

this reduced the time for a regression calculation and graph generation over the data from ~20sec to <1sek

Carl Hörberg 2009-07-27 17:35:09

Answer 127

A:

Heavily nested Perl text processor where an inner loop had a line of

s/ +/ /g; (replace all spaces with a single space)

Profiled the app and noticed that single line accounted for 95% of CPU time. Removed the line, was very happy with the rather explosive speedup...

Tino Didriksen 2009-07-27 18:25:44

Answer 128

A:

Changing

SELECT SOME_COLUMNS FROM TABLE WHERE ID IN ('A', 'B', 'C', 'D')

to

SELECT SOME_COLUMNS FROM TABLE WHERE (ID = 'A') OR (ID = 'B') OR (ID = 'C') OR (ID = 'D')

Speed up the query executed against SQLMobile in something about 30x (measured)

Rodrigo 2009-09-02 22:20:58

Answer 129

A:

Added nusoap_base::setGlobalDebugLevel(0); in a SOAP Server written in PHP.

It increased the performance of the SOAP server easily by a factor of five.

What is most interesting is this isn't documented anywhere as far as I could tell, and I only came across it after reading an obscure mailing list post where someone suggested this.

Lindsay Blanton 2009-09-02 22:31:24

Answer 130

+2 A:

Spent less time on Stack Overflow for a day.

2009-09-02 22:32:56

Answer 131

+1 A:

Once I didn't change an application, but just "waved the wand" and the speed increased ten times! I ran CPAN update to upgrade to the newest versions of the Perl unofficial modules. This increased my speed due to a bugfix in one of the application-critical modules.

Pavel Shved 2009-09-02 22:34:56

Answer 132

+2 A:

I doubled the performance of an in memory matrix calculation by storing the matrix row-wise instead of column-wise. This improved cache locality.

SebastianK 2009-09-02 22:51:17

Answer 133

+3 A:

Replacing a frequently hit division by 4 to a bit shift operation.

ghills 2009-09-02 22:56:11

compiler should do this ... and it's potentially unsafe

Wahnfrieden 2010-04-28 01:04:33

Why is it potentially unsafe? If it's potentially unsafe, then the compiler wouldn't do it.

erjiang 2010-05-07 21:42:21

I think the rule of thumb is not "the compiler will handle that", it's more like "you'd be surprised". For instance, GCC does *not* optimize division/modulus for signed integers.

Joey Adams 2010-08-09 04:58:12

Answer 134

+2 A:

Changing a SQL query against several million rows so that instead of

WHERE dbo.fn_TrimDate(ActionDate) = @Today

I had

WHERE ActionDate BETWEEN @Today AND (@Today + 1)

fn_TrimDate being an ugly function that ripped off the time part of a datetime field.

The query went from an average of 0.5 secs to being almost instananeous.

CodeByMoonlight 2009-09-02 22:59:46

Answer 135

+1 A:

Recently while writing a Java application which reads REST responses our team were using DOM based XML parsers, mainly because selecting things out by XPath is nice and easy to code. Bad move!

We switched parsing and serialisation over to event-based XML classes (in our case StAX). It vastly improved the memory footprint of the application, which has a massive impact on scalability and sped up the processing by at least an order of magnitude.

UltimatePace 2009-09-02 23:04:44

Just re-read the actual question... It wasn't really a small change, but it did make things better!

UltimatePace 2009-09-02 23:06:54

Answer 136

+1 A:

The line

new Regex(pattern, RegexOptions.IgnoreCase);

was changed to:

new Regex(pattern);

It improved performance by about 1400% as case sensitivity wasn't required.

dejanb 2009-09-02 23:06:15

Answer 137

A:

I was working with a very long DB2 query. It ran in the Test environment in 30 seconds, but in Production we had to cut it off after running all weekend due to the massive amount of data.

The query was optimized to death and could not be made faster by structure alone.

So we added this to one of its subjugate WHERE clauses:

and (1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1[...])

Doing so caused the DB2 parser to add a couple additional SORTs to the execution path and ended up making it run in two hours.

tsilb 2009-09-02 23:12:37

Doesn't this change the output?

recursive 2009-09-03 00:25:23

bah, fixed

tsilb 2009-09-03 17:29:24

Answer 138

A:

Heeding the top-level question, probably the biggest improvement I've had for the smallest change would be to correctly size the settings of a MySQL server for the hardware it was on. The defaults for MySQL - even the 'huge' ones - are extremely conservative. In particular, several of the memory parameters (e.g. sort_buffer) can be increased a thousand times and this will give a significant boost of performance. And table_cache is often way too low. I've had it up at 1500 on some servers.

staticsan 2009-09-02 23:30:27

Answer 139

A:

Converting some Oracle Pro*C code to built-in PLSQL - yes, you read right convert a C function to PLSQL.

The issue wasn't so much to do with the C code itself, I'm sure that runs fast. The problem is that the abstraction between Oracle and Pro*C is super slow. So, converting the one function sped the rest of it up by about 100 times.

I should add that some Oracle SQL code was calling this external Pro*C code repeatedly. So bringing it into PLSQL meant less call overhead and faster execution.

Matt H 2009-09-02 23:47:33

Answer 140

A:

C# - I used Generic.List instead of ArrayList while migrating from database to database. It saved 6 minutes and a lot of unnecessary reboots.

2009-09-03 03:04:43

Answer 141

A:

After wondering why a window took so long to show up, I once removed the following line from a colleague's code

Thread.Sleep( 5000 );

At some point this must have been meant to have the application wait for some other thread to finish, but that was not an issue anymore because the code had been refactored many times since then.

galaktor 2009-09-03 17:37:48

Answer 142

+1 A:

Changing getPixel value from Bitmap object (.NET) to direct unsafe bit manpulation. The performance caused the method go from 4 minutes to 1 second.

monksy 2009-09-04 05:42:49

Answer 143

+1 A:

Some time ago I had a column in an Oracle database, which had a value when the column had been processed, and was null when not. The table had several hundred thousand items.

Of course there was an index on this column. But Oracle does (at least did in version 8) not store null values in to the index.

So a query like this

select * from VeryHugeTable where ProcessingId is null

took hours, although it only returned a few records.

We changed the null value to an arbitrary negative number:

select * from VeryHugeTable where ProcessingId = -9

I can't remember how fast it was, but it was incredible, a few minutes if not even faster.

Stefan Steinegger 2009-09-07 20:50:27

Answer 144

+1 A:

Can't remember the exact code but we changed this:

int readSize = 1024;
result = fread(buffer, readSize, 1, file);

to

int readSize = 1024*1024;
result = fread(buffer, readSize, 1, file);

Never underestimate how slow I/O is.

rein 2009-09-07 20:54:32

Answer 145

A:

Removed VIEW STATE from an ASP.NET page. Page went from 800KB per request to about 10KB per request. That view state can be evil.

rein 2009-09-07 20:57:36

Answer 146

A:

Converting all MySQL subqueries to use combination of joins and temporary tables. The improvement was unbelievable.

seengee 2009-09-07 21:10:24

Answer 147

+3 A:

The best performance improvement I've ever seen is my performance when I turned off twitter :)

seengee 2009-09-07 21:12:29

Answer 148

+1 A:

Dropping Java's array clone method and using other methods instead. It turns out cloning is very resource consuming and it should be used only when definitely necessary.

It dramatically improved my Java code's performance.

Haluk 2009-09-17 03:24:12

What "other method" were you using?

portoalet 2010-03-15 02:57:59

Answer 149

+1 A:

An old application started going haywire on submissions of new data when we moved to a new SQL Server silo. It went from 1-2 seconds to several minutes. Obviously something changed on the SQL/network side but after 3 days we weren't able to identify it.

Upon examining the code we noticed that it had a random identifier based on the time (goofy design - not mine - SQL Identity or GUID work fine for me), only it was seeding to the millisecond. So the code only had 100 different seeds meaning it would likely hit the same pattern of randoms and cycle through until it found the next available one.

We seeded to the current time (instead of millisecond) and boom, 1-second submissions.

On a side note, our development environment had the same SQL/network problem but it went unnoticed because the Web server (a VM) was so slow that the random identifier algorithm (20 random characters based on current millisecond) produced an identifier built from several different random seeds whereas prod built from a single random seed. A glorious bug that was kind of fun to uncover / resolve.

Mayo 2009-09-18 12:22:40

Answer 150

+1 A:

Dynamic Programming, Sometimes It's amazing how the use of a simple look-up table of some values, in a recursive function, can help. for a small example check this Fibonacci in C++.

Liran Orevi 2009-09-18 12:27:40

Answer 151

A:

I broke one complex query into to two separate relatively small queries, and it improved performance by an order of magnitude, I was surprised.

Jay 2009-09-18 12:42:11

Answer 152

A:

Once upon a time I've added /SSE2 option to my Visual C++ project and got +10% performance.

Kirill V. Lyadvinsky 2009-09-18 12:43:55

Answer 153

A:

Instead of doing all of the lookups against the database in our web app, the lookup information is pulled into a HashTable in memory and kept for an hour:

HttpContext.Current.Cache.Insert(Name, htData, Nothing, DateTime.Now.AddHours(1), System.Web.Caching.Cache.NoSlidingExpiration)

We really don't need anything fresh to the minute, and looking the info up from the DB once an hour (instead of 10 times a second) improved performance trememdously.

FlipScript 2009-10-13 02:53:39

Answer 154

+1 A:

Using Sqlceresultset instead of Insert Query.It boosts performance in pocket pc applications especially when u deal with Bulk inserts.

Using datareader instead of datatable as datagrid datasource if u deal with above than 1000 records resultset.

Partition in oracle database.It improves %25.

Using string.empty instead of "" if u want to check a variable's "" value.

Alexander 2009-10-16 21:43:59

Answer 155

+1 A:

Switched from an OR construct to an IN construct in MySQL - over a 10x speed improvement!

warren 2009-10-23 05:19:14

Answer 156

+2 A:

Changed

for( int n = 0; n<things.getSize(); ++n ){ ... }

to

int count = things.getSize();
for( int n = 0; n<count; ++n ){ ... }

Saved about 11% in the rendering loop. (count was around 50000)

Andreas 2009-11-19 13:36:57

for( int n = things.getSize()-1; n>=0; --n ){ ... } works too if order isn't important

Josh 2010-06-11 08:14:10

Answer 157

A:

Didn't did it in practice, but I had to normalize a matrix on a parallele machine : meaning for each column, divide each value by the average of the column.

Well with Direct Maping Cache, if the matrix is stored by continuous values of rows in memory, you can get a 100% miss. It depends the most inner loop's data storage strategy.

An other "error" is to invert the "i" and "j" in the nested loops (where i is for the lines and j for columns) which makes a very easy optimisation on the code withou rewriting anything (just cut and paste)

Aif 2009-11-19 13:47:17

Answer 158

A:

Changing this :

WHERE dbo.fn_TrimDate(DateTimeField) = dbo.fn_TrimDate(GetDate())

into this :

DECLARE @StartToday datetime
SELECT @StartDay = dbo.fn_TrimDate(GetDate())
...
WHERE DateTimeField BETWEEN @StartDay AND @StartDay + 1

CodeByMoonlight 2009-11-26 08:35:56

Answer 159

A:

Some image processing work I did about 8 or 10 years ago on PPC/AltiVec system when they first came out. Converted an nXm convolution originally written for i386 ported to a mercury mcos system (Very fast PPC/Altivec processors linked together by a highspeed backplane and CCNUMA memory arch). Really sped up just after a simple code port, when taking advantage of their parallel processing libraries it boosted it by about 22x over the non-parallel hand-coded version. Moral of the story - vector processors are nice! Although not as radical of decrease in run-time, I saw substantial savings in using their FFT algorithms as well . . . TheEruditeTroglodyte

TheEruditeTroglodyte 2009-12-12 03:14:42

Answer 160

+2 A:

I once wrote a sequence of SQL queries which worked on huge amount of records. The performance was really poor taking 4 minutes to execute. Then i wrapped it with Begin Transaction and End Transaction. It spit the result in 5 secs.

NLV 2010-02-10 10:38:51

Answer 161

A:

I just removed the try catch block and put the if condition check so that it wont throw the exception. That code block was executing more than 10K times to deserialize the data and kind of expected to be throwing exception and my previous developer just left that code unremoved. when i had to look for improving performance of loading the serialized file, i did this small tweak and improved a lot from kind of 36 secs to 3 secs.

Note : This would have been mentioned in either of the provided answer, but as I could not read all the answers and confirm myself whether it is already present, I am typing this answer. Sorry about if this is the repeated answer.

sankar 2010-03-01 14:19:32

Answer 162

+2 A:

Database Indexes. We had an application that was using lookup tables fairly heavily, but there were no indexes on any of the appropriate columns. A coworker and I did two things:

Added indexes to all the id columns on the lookup tables
Switched the ORM for our heavier queries to do find_by_sql

Those two changes netted us a roughly 50% speed increase in database access, and made the application noticeably faster. It just goes to show you that you can't disregard good database design because you've got an ORM handling most of the work for you.

Mike Trpcic 2010-03-01 14:28:46

Answer 163

A:

This is specific for WinForm .NET. Turn off DataGridView.AutoSizeColumnsMode and AutoSizeRowMode.

David 2010-03-01 14:39:17

Answer 164

+1 A:

When I put an SSD in my laptop

keithwarren7 2010-03-08 17:17:22

Answer 165

+1 A:

Application spiked the CPU on the SQL Server to 100% at 8am as each time zone logged in for the first time. The server had 128GB of RAM and maxed out # of CPUs. New DBA, and by "new DBA" I mean the first DBA they ever hired in their 6 years of operation, found a query with an LTRIM() on a numeric column that was the join between two tables.

Removed the LTRIM and the CPU basically flat-lined.

iKnowKungFoo 2010-03-08 17:21:45

Answer 166

+1 A:

In some C# code I replaced some reflection to dynamically get property values with dynamically compiled lambdas and got about 100-1000x speed increase!

Gabe 2010-03-08 17:30:06

Answer 167

+1 A:

Created 3 indices in the database. The net performance went up about 25-fold.

Seva Alekseyev 2010-03-08 17:33:09

Answer 168

A:

There was an application I worked on once that output a large amount of data. Basically, anything it did would be written to various files so it could be analyzed later. Depending on the type of work being done, this application could take days to finish running its calculations. The original developer had used 'endl' to terminate every print statement. I replaced the endl's with \n (a careful find/replace) and saw the performance improve by 15 - 20 percent.

JasCav 2010-03-08 17:36:19

Answer 169

A:

I matched TCP packet size between two application servers (setting MTU value in the Windows registries). For whatever reason, the network between these two servers had a smaller MSS value than the typical default values, causing fragmentation/reassembly at the TCP level. Matching these two servers to the lowest common denominator between them decreased execution time down to 1/3 the original time for our distributed application.

The network tech could give me no answers so I took matters into my own hands.

tyriker 2010-03-08 17:39:40

Answer 170

+1 A:

When i joined a project mid-way where already around 80% of coding had been done, i was given the task of looking into the project for any optimization possible. The first thing that i came across was the habit where the objects were not disposed after their use. So I just introduced the following in the finally block

//Declare some object
MyClass ob1;

try
{
    //instantiate the object 
    ob1 = new MyClass();

    //Perform operations
....
....
}
catch
{
    //perform some operation
....
....    
}
Finally
{
    ob1 = null;
}

And it worked wonders and the application now was working 30% faster.

HotTester 2010-03-09 05:06:38

Answer 171

A:

My brother had a case in an Ada program where the bitfield declarations were inadvertently crossing a word boundary. Fixing that improved the method by a factor of 350,000. Yes, no typo, three hundred and fifty thousand times.

EJP 2010-03-09 05:13:07

Answer 172

A:

Example is from J.

I see this often:

v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

And sometimes that's just to take some rows out of it:

idx { v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

Thing is, ,. "stitches" two same length vectors, or two same length matrixes. But every time you use it, the interpretor has to create a new matrix just one more column wider and stitch. And again, and again.

Prefer the following:

> idx & { each v1 ; v2 ; v3 ; v4 ; v5

At some point using boxes gets old, because they have so much overhead. But remember, boxing every single element is often uncessessary, and very slow.

Also fun is the key conjunction /.. But most beginners will only think of boxing with it and then apply whatever verb they want to each box.

For example, getting grouped sums can be done with this:

; +/ each x </. y

But should be done like this:

x +//. y

MPelletier 2010-03-09 05:26:31

Answer 173

+1 A:

I once came accross this gem:

for (int i = 0; i < count; i++)
{
    var result = dosomething();
    useresult(result,i);
}

Of course, dosomething() would always return the same result at every iteration. Moving it out of the loop helped!

MPelletier 2010-03-11 04:28:04

Answer 174

A:

In J (again).

The 'dll' library creates nice verbs to manipulate memory. mema allocates, memr reads, memw writes and memf frees.

If you have a lot of addresses to read, J will evaluate memr at every read.

So that:

memr each BunchOfAdresses

is much slower than:

15!:1 each BunchOfAdresses

The tricky thing with 15!:1 though is that it can read all adresses passed, but will crash like a bitch if you give it a null (15!:1 (0 0 _1)) (Where the first 0 is the address, the second is a byte offset, and the _1 is -1, for length, in this case "read to first null").

So, if you have a lot of addresses, what do you do? Well, you could wrap your reader like so:

memr2 =: 3 : 0
    if. 0 = {.y do.
        ''
    else.
        memr y
    end.
)

But that's going to be a pain. J will evaluate that every read. Plus another evaluation if you have memr and not 15!:1.

Instead, if reading a lot of addresses, replace the nulls with an address you declare yourself, where you store a default value of your choice.

adrNull =. mema 1  NB. Allocate 1 byte
'' memw adrNull, 0 1 NB. Set byte to null
bunchOfAdresses =. adrNull (bx bunchOfAdresses = 0) } bunchOfAdresses  NB. replace all null addresses with out new address.
result =. 15!:1 bunchOfAdresses,"1 [ 0 _1  NB. append a 0 offset and -1 length to all addresses and read.
memf adrNull NB. always cleanup after

MPelletier 2010-03-19 03:57:16

Answer 175

A:

At my previous company we were using a lot of third-party code to speed development of our main product. We found that one component in particular dramatically increased the startup time of the application. So we spent a few days pouring over their source code to figure out how we could improve performance. At one point, I ran into this little gem:

CObjectManager::CObjectManager() {
    components.init();
    Sleep(10000); //Required for multi-threading 
    components.start();
}

We pressed the company to explain the code, and they insisted that "multi-threaded apps require proper timing," and if you remove the Sleep, it will break.

Apparently the original developer coded himself into a race condition, and to solve it he simply called off the race.

tylerl 2010-03-21 17:15:41

Answer 176

+1 A:

Changing two tokens improved my toy vector performance from O(n^2) to amortized O(n) when inserting n elements.

Slow: new_capacity = old_capacity + 10

Fast: new_capacity = old_capacity * 1.5

FredOverflow 2010-05-07 21:11:10

Good ol' powers of 1.5 for resizing arrays.

Joey Adams 2010-08-09 04:51:35

Answer 177

A:

I recently did such an improvement in a Qt4 project.

Loading 5k lines (or 50k fields, tabular data) from a text file into a QStandardItemModel object took ~5-6 sec. Now it takes ~0,5 sec.

The problem was that the model was attached on a view object.
The solution was to detach the model, load the data and then attach the model again.

I added 2 simple lines of code and I speed it up by 10x.

Perhaps there is a proper Qt way for that (like preparing the view for massive updates) but I didn't have the time to discover it and my quick n dirty hack worked great.

Nick D 2010-05-18 08:30:24

Answer 178

A:

Passed by reference instead of value. A huge structure containing image data.

2010-08-20 17:09:23

Answer 179

+4 A:

Changed a stored proc from this:

@numberParam varchar(16)
...
SELECT ...
FROM ...
WHERE id = CAST(@numberParam as int)

to this:

@numberParam int
...
SELECT ...
FROM ...
WHERE id = @numberParam

Hello indexes!

Abe Miessler 2010-10-14 15:46:44

Ooops, upvotes do not add reputation for community wiki posts

vgv8 2010-10-14 18:05:13

ansaurus

tags:

views:

answers:

Biggest performance improvement you've had with the smallest change?

EDIT:

related questions