views:

419

answers:

9
+1  Q: 

CPU Utilization

Q1. What are best practices for a writing a code that does not consume CPU but still achieve a great performance? The question is very generic. What I seek over here is to list down different practices used for different environments? debugging tips besides process-monitor/task manager

EDIT: I am not speaking of IO bound processes. I am speaking of CPU bound process. But, here I do not want my process to keep on hogging CPU. If i have a 4 core machines and if i run four simple loops within a process, the CPU consumption shoots up to 400% till the application/process is running.

I am seeking here some experience on the topic which everyone would have faced some time or other. e.g. I debugged once an application was hogging CPU on Windows as it was looping continuously to search for a non-existent file.

How can I write my program in a way that two different CPU bound applications run smoothly (give a good response)?

UPDATE: Suggestions:

  1. Write good clean code, then Profile your application and then optimize. (Thanks ted for the tip)

  2. It is easier to rewrite/redesign/refactor code than profiling and fixing it.

  3. Use a profiler to debug your application

  4. Don't use spinlocks for threads with long waits

  5. Algorithm choice

These suggestions go a long way for the beginner to understand the concepts.

+1  A: 
  • Use lazy values and/or other kind of caching
  • Choose your algorithms with care
egaga
+6  A: 
  • Use a profiler religiously. Don't rely on common sense when searching for bottlenecks.
  • Learn big-O notation, remember big-O for common algorithms.
  • Avoid busy wait loops at all costs.
  • In case of embedded, learn how to make code fit into code cache, that can make a tenfold speedup sometimes on tight loops.
  • When doing high-level layered development, learn how to cache data efficently (for example, to minimize the number of DB statements).
alamar
why the -1? it all seems good advice
Javier
@syed: non-CPU-cache-aware algorithms in fact are a great nuisance for data-heavy CPU-bound programs. case in point: quicksort has a O(n^2) worst case, but it performs much better than heapsort (which has a O(logn) worst case), mostly because it makes better use of the CPU cache
Javier
Heap sort has O(n * log n) worst case, not O(log n).
Matthew Flaschen
alamar: +1, i understand now.
Syed Tayyab Ali
Profiling and algorithm choices are definitely the keys. Figure out where the problems actually are, not just where you think they are. Then pick off the top issue, re-run, re-profile and repeat until you have the performance you need.
GreenKiwi
+1  A: 
  • follow code optimization techniques.

  • calculate memory of your operations.

  • calculate time of each operations.
    (big o notation)

Syed Tayyab Ali
Do not calculate. Profile!
alamar
@Alamar, choose the right algorithm, design and implement correctly and you shouldn't need to profile. Profilers don't fix things, they just let you know they're broken.
Shane MacLaughlin
You're guaranteed to have perfomance problems if you're thinking that you're too smart to profile.
alamar
http://stackoverflow.com/questions/888224/what-is-your-longest-held-programming-assumption-that-turned-out-to-be-incorrectsee answer #3
alamar
It's stupid to optimize without profiling first. That's the equivalent of trying to fix a logic bug by modifying random code
T.E.D.
+6  A: 

Do as little work as possible.


Since you have edited your original question I'll add some more thoughts here to address the specific situation that you have described.

Assuming that you don't know where your process is blocking (since you were asking for debugging tips) you can start by just pausing the debugger, this will stop the application whatever it is doing and from there you can investigate the current location of all the threads and see if any of them are in a tight loop.

Secondly any decent profiler will easily help catch situations like this. Attach the profiler and run the application to the blocked point look at the calls that are dramatically getting a greater percentage of your total runtime. From there you can work your way back out to find the blocking loop.

Once you have located your problem ideally rethink the algorithm to avoid the situation completely. If this isn't possible then introduce sleep commands on the thread. This will allow other threads to get onto the CPU and increase the responsiveness of the application and the OS as a whole at the expense of increasing the operations runtime. The trick to multicore programming is ensuring that all your threads compromise between performance and consideration to the other waiting tasks.

Without knowing the specific language or operating system your targeting I can't advise on the but debugger/profiler combination for the problem, but I'd imagine there are good solutions out there for most mature languages.

Martin Harris
Not only a good answer for the question, but a good philosophy for life in general.
CoverosGene
Mind you, your reporting manager should not profile you :) or else your optimized!
Devil Jin
I'm not a fan of the "pause the debugger" school. There are tons of reasons why you might get misleading answers this way. If you can't get hold of a true profiler, temporarily bracket all your code with timing code.
T.E.D.
I agree, and I will admit that I've probably had a less than 10% success rate of ever actually finding the problem that way. But the odds are high enough in my opinion to make it worth the press of a single button - if the code pauses on some block like while(!connected) { connect(); } (which it shames me to say is one hit in my real life that makes up that 10%) then you've found the problem in a fraction of the time a profiler or timing code would take.
Martin Harris
+1  A: 

If its purely CPU usage gain you're after then it's big-O notation you need. You need to work out how to get your algorithms to work in the least computations possible.

However, as for general performance, I find CPU usage to be one of the lesser bottlenecks.

Some more important things to look at with performance is

Data binding, do you get all the data up front or just get it as required. Choosing one of these methods can be key to the performance of your app.

Can you reduce the data you're working on? If you can get it all to fit in memory easily, you can gain performance here. On a side note, if you put too much in memory it can have an opposite effect.

I think to sum up, there is no generic solution to performance. Write your code (with some intelligence), then look at where it is struggling.

Robin Day
you make sense !!
Devil Jin
I tend to avoid code level optimizations. They tend to eliminate readability of the program. Instead an algorithm will do your much faster and smoother.
Devil Jin
+5  A: 

First, write good clean code. Do things the simplest way possible. After that, do the following repeatedly until you are satisfied with speed of the program:

  1. Profile its execution.
  2. Find the parts where it is spending the most time.
  3. Speed up those parts.

Do not fall into the trap of perverting your code up front in the name of optimization.

Remember Amdhal's Law. You aren't going to get noticeable improvements by speeding up something that is already only consuming 1% of your program's time. You get the best bang for your optimization buck by speeding up the part your program spends the most of its time doing.

T.E.D.
I definitely vote this one of the best tips for anyone writing a code.
Devil Jin
+2  A: 

I am speaking of CPU bound process. But, here I do not want my process to keep on hogging CPU. If i have a 4 core machines and if i run four simple loops within a process, the CPU consumption shoots up to 400% till the application/process is running.

You will probably want to look into throttling mechanisms to reduce your CPU utilization when in idle state:

Under normal circumstances, your code will even consume CPU cycles when it doesn't have to do anything (i.e. "busy wait").

For example, an empty infinite loop will simply run as fast as it possibly can, if it doesn't have to do anything else.

However, in some circumstances, you don't want to busy wait, or on some platforms you may want to avoid it at all.

One established way of doing this is to use sleep calls when idling, so that the system scheduler can reschedule all running threads. Similarly, you could use timers to determine your function's actual update rate and simply avoid calling your code if it doesn't have to be run (this is a mechanism that is sometimes used by games or simulators).

In general, you'll want to avoid polling and instead use intelligent data structures (for example a job queue) that provide for a possibility to automatically adjust your runtime behavior accordingly, without having to check the data structure itself permanently.

none
Even just using `yield`, rather than `sleep` can help out with this sort of thing.
Curt Sampson
+1  A: 

It's not quite clear to me whether you're looking for ways to make the most efficient use of CPU, or ways to avoid bogging down a machine when you've got a lot of CPU-intensive stuff to do.

These are incompatable.

For the former, you ideally want an OS that will simply let you take over the CPU(s) completely for as long as you like, so you don't have to waste CPU cycles on the OS itself, not to mention any other processes that might be running.

For the latter, well, I've been writing some some code lately that uses poorly designed CPU-bound algorithms, and the new Intel I7 processor has been my saviour. Given four cores each able to run two threads, I merely try to limit my OS thread usage to five or six per application, and I've still got CPU available to switch to another window to run the kill command. At least until I drive the system into swap with space leaks.

Curt Sampson
A: 

Good suggestions here. The simpler you write the code, the more time you will save yourself.

I look at performance tuning as an extension of debugging. People say measure, measure, measure, but I don't. I just let the program tell me exactly what the problem is, by dropping in on it unannounced, several times if needed. It's usually a surprise, and it's never wrong.

The usual history of this, depending on how big the program, is finding and fixing a series of performance problems, each giving anywhere from 10% to 50% speedup (more if there is a bad problem). This gives an overall speedup of maybe 10 times.

Then the samples tell me exactly what it is doing, but I can't think of how to fix it without a basic redesign, realizing that if I had done the design differently in the first place, it would have been a lot faster to start with.

Supposing I can do the redesign, then I can do a few more rounds of performance find-and-fix. After this, when I hit the point of diminishing returns, I know it is about as fast as physically possible, and I can single-step it at the assembly level and watch every instruction "pulling its weight" in getting to the answer.

There is real satisfaction in getting to that point.

Mike Dunlavey
Another valid point! thanks Mike
Devil Jin