tags:

views:

1289

answers:

20

This is undeniable: multicore computers are here to stay.

So is this: efficient multicore programming is pretty difficult. It's not just a case of understanding pthreads.

This is arguable: the 'developer on the street' need concern him/herself with these developments.

To what extent are you concerned about having to expand your skillset for multicore? Is the software you are writing a candidate for parallelisation, and if so are you doing anything to educate yourself (if you didn't already know the techniques)? Or do you believe that the operating system will take care of most of it, the language runtime will do its bit and your application will happily sit on one core and let the others do their thing?

+2  A: 

I've been programming with threads for over 15 years now. I am not worried in slightest

gbrandt
+2  A: 

I'm not worried. The concepts aren't too difficult and more developers writing multithreaded apps = more material on the subject = easier to figure out what you need to.

Spencer Ruport
+21  A: 

Are your programs typically CPU bound?

If not, forget it. It doesn't concern you, and gives your users a smoother experience without making any demands on you at all.

Cool, eh?

If you are CPU bound, and your problem is parallelizable, you might be able to leverage the multiple cores. That's the time to start worrying about it.


From the comments:

Suggestion for improving answer: give rough explanation of how to tell if your program is CPU bound. – Earwicker

CPU bound means that the thing preventing the program from running faster is a lack of computational horse-power. Compare to IO bound (or sometimes network bound). A poor choice of motherboard and processor can result in machines being memory bound as well (yes, I'm looking at you, alpha).

So you'll need to know what your program is doing from moment to moment (and how busy the machine is...) To find out on a unix-like systems run top. On windows use the taskmanager (thanks Roboprog).

On a machine with a load less than 1 per core (i.e. your desktop machine when you're not doing much of anything), a CPU bound process will consistently have more that 50% of a processor (often more than 90%). When the load average is higher than that (i.e. you have three compiles, SETI@home, and two peer-to-peer networks running in the background) a CPU bound process will have a large fraction of (# of cores)/(load average).

dmckee
Might just be me but I'm not sure what you're saying here. :(
Spencer Ruport
He's saying that you won't even have to deal with multicore issues unless your program is CPU-intensive. If it's not, then it's better just to have your program use a single core anyways to free up CPU power for other programs.
Daniel Lew
Exactly right. Another way to say it: let the OS designers worry about that.
Jon Ericson
Suggestion for improving answer: give rough explaination of how to tell if your program is CPU bound.
Daniel Earwicker
"task manager" is windows-ese for "top", FWIW
Roboprog
I do not completely agree. If the program is memory bound, you can speedup the program (if that is needed) a lot by restructuring program to restrict memory usage. Also, exploiting data locality on caches.
Amit Kumar
@Amit Kumar: That's right, but it is independent of who many CPU cores you have (until you do good enough job that the program gets to be CPU bound). However, a lot of what we do is input bound. Wait for slow humans, wait for the network, etc...
dmckee
Side note: I never saw this as *the answer*, but as one input to a complicated question. Lots of important stuff in the other answers here, don't neglect them.
dmckee
@dmckee The question is how fast performance is required. This has little to do with whether the program is CPU bound or memory bound. If you need it fast, you need it fast. CPU is overemphasized (this is the old conventional wisdom - the new CW is that memory is far).
Amit Kumar
+7  A: 

It's a good argument for starting to learn functional languages, which are easier to optimize for parallel execution.

Morendil
I'm going to vote this up, but I stand by the point that there is no gain is writing parallel code to wait for network packets (or key strokes, disk IO, or...) that come in on a time scale much longer than the OS time slice...
dmckee
+3  A: 

I think this is generally worth taking an interest in, to put it mildly.

It hardly needs saying that the massive increase in speed of CPUs over the past few decades has been extremely valuable, and that further gains will be just as valuable.

But those gains will from now on mostly consist of a regular doubling in the number of cores. So to benefit from these gains, software needs to be parallelizable.

A lot of the computation-intensive parts of many applications are actually written in SQL, so they are already functional and capable of being broken down into parallel tasks by the RDBMS. So those people can relax.

But those of us writing mostly in C#, even if we're writing GUIs, we need to pay close attention to this stuff. A GUI frequently has to perform some useful operation on whatever model it presents to the user, and the user gets annoyed when they have to sit and wait for it to finish. They'll get even more annoyed in a few years time, when they look at Task Manager and see that around 3% of their fancy new 32-core machine is being utilized.

Daniel Earwicker
+15  A: 

Just a side note: If your app has a GUI and does intense computation, ALWAYS do your intense computation on a separate thread. Forgetting to do this is why GUIs freeze up.

tsilb
Absolutely... Because perceived speed can be more important than actual speed. Your app should not frustrate your users.
geofftnz
Good advice. GOOD advice. But it is good advice on single core systems too.
dmckee
Thus the 'Always'. :)
tsilb
Is this why iTunes is so dog-slow on Windows when connecting devices, etc?
Andrew Keeton
Google's Chromium blog describes the great lengths the Chrome browser goes to move all file and network I/O to background threads: http://blog.chromium.org/2008/10/responsiveness-for-plugins-and-renderer.html
cpeterso
Wish the MS Outlook team had taken note of this :(
RobS
Your advice is very useful, but not specific to multicore. Parallel code speeds up all non-trivial programs you encounter in real life.
DonkeyMaster
@DonkeyMaster: True, but as multicore becomes more prevalent, single-threaded or poorly-threaded apps have become an increasingly obvious and problematic issue. For example, most games still peg one CPU and leave the others idle.
tsilb
@tslib I don't know if that's true or false, but my guts tell me it doesn't matter. If a game is tuned for performance, this is one of the most effective improvements, therefore I suppose that if a game can use multiple cores, it will. Old games only use one core, but it doesn't matter because one modern core is enough to handle any game from 3 years ago.
DonkeyMaster
I refer you, sir, to Supreme Commander. Great and smooth until you get a few hundred units into an epic battle. For example, my standard attack pattern is a group of 300 T3 Airships. Oh it gets laggy.
tsilb
+4  A: 

I think what is likely to happen is that once large numbers of cores (say 8+) become commonplace, then we'll see development of applications that take advantage of parallelism that were not considered viable in a single-threaded world.

I cant think of specific examples, but consider what happened when 3D accelerators became common. Games at the time (think Doom) were bound by the speed of their software rendering code. Having highly-detailed 3D models, simulating reflection/refraction and per-pixel lighting were not even considered. Nowadays everyone does it.

So unless your current apps are highly CPU-bound, I would not worry about parallelising them. If you find you have heaps of CPU power via multiple cores, then look at ways to exploit it in new projects.

geofftnz
+1, that's what I was trying to say in my answer - user expectations are going to change, so to write a competitive app tomorrow you will need to take advantage of multi-cores whenever you go into a loop of a length controlled by the user.
Daniel Earwicker
The is a fair chance that we'll see a time when memory and IO bandwidth throttle things, at least of consumer machine. Bigger (and smarter) on-chip caches are a partial solution. Improved buses and motherboard architecture are the rest of the solution.
dmckee
Certainly we'll see more background indexing, pre-computation, and other tricks that are possible, but not worth is a only a few cores and limited memory bandwidth.
dmckee
yeah, memory bandwidth is the killer. If only we could get CPUs with 8Gb of cache...
geofftnz
+1  A: 

Well, since I do web development in ASP.Net, there are a few areas I could see multicore playing a role:

1) Client-side. How can something like Javascript be optimized for the client that has a quad-core CPU if that is what someone wants to harness in running something like sorting a long list of data. Are fat-clients coming back with the new versions of IE, Firefox, Safari and Chrome?

2) Server-side on a web server. Within IIS and the .Net framework that it uses, how do things like PLINQ help use parallel or concurrent programming to help speed up handling requests? What kinds of IIS settings can be done to enhance performance and tune it to the hardware?

3) Middleware/DB Back-end. How does the latest MS-SQL Server or Oracle or MySQL handle using the additional resources of both multi-core and multi-socket, e.g. if a quad-socket motherboard has quad core CPUs in each socket and something like Hyperthreading on top there are 32 threads that could run at once which is really different than a single core CPU back in the days.

In addition, there is something to be said for the multicore aspects of GPUs where Crossfire and SLI were the beginning but now there are more hybrid graphics solutions that one can wonder how this will be harnessed in the future, e.g. AMD's Fusion is one idea that I'm not sure how well it'll do but it is coming last I heard.

On the subject of educating myself, I'm not sure how hard would optimizing my code would help in some cases. I'm more interested in how will IIS try to harness the new computing realm before it as that could ultimately be limiting some things that can be done, even if I isolate my code to run in its own little world.

These are just my current thoughts and are subject to change at any moment.

JB King
Its probably not worth multi-threading web apps, because they are already multi-threaded - for many simultaneous users. Making the code for 1 user threaded is a waste of time, the CPU and IO will already be used up by all the other user requests that are in progress. You may end up making things slower overall, and more unreliable.clientside, you have a good point, js is already mostly a functional language, it should be able to make use of threads well.
gbjbaanb
+5  A: 

Yeah, I've been programming with threads, too. But I'm not masochistic enough to love them. It's still way too easy to get cross-talk between threads, no matter how much of a super-man you are, plus whatever help you get from coworkers. Threads are easy to do, but very difficult to do correctly, so of course Joe-Schmoe gravitates to it, plus, they're fast! (which is all that matters, of course)

On *nix, good old fork() is still a good way to go for many things. The overhead is not too bad (yes, I'll need to measure that to back up my BS some day), particularly if you are forking an interpreter, then generating a bunch of task specific data in the child process.

That said, child processes are hideously expensive on Windoze, I'm told. So the Erlang approach is looking pretty good: force Joe Schmoe to write pure functions and use message passing instead of his seemingly-infinite-state automata global (instance) variable whack-fest with bonus thread cross-talk extravaganza.

But I'm not bitter :-)

Revision / comment:

Excellent comment elsewhere about distance-to-memory. I had been thinking about this quite a bit recently as well. Mark-and-sweep garbage collection really hurts the "locality" aspect of running processes. M/S GC on 0 wait state RAM on an old 80286 may have seemed harmless, but it really hurts on multi-level caching architectures. Maybe referencing counting + fork/exit isn't such a bad idea as a GC implementation in some cases?


edit: I put some effort into backing up my talk here (results vary): http://roboprogs.com/devel/2009.04.html

Roboprog
use message passing... see, Win32 PostMessage will be back in fashion before you know it :)
gbjbaanb
+2  A: 

I would argue that for most programmers and applications, significant-multicore does not present a significant advantage or potential over standard multithreaded development. Most people have threads to accomplish sequential jobs, and there isn't that much potential for splitting up those threads to much smaller units.

IMHO, most benefits of significant-multicore would come from improvements to underlying frameworks (e.g., database access, IO, GUI and 3D toolkits, etc.), and the vast majority of developers would benefit transparently.

In addition, future static analysis tools may be able to recommend pieces that could be split further into threads.

Uri
whatever happened to the idea of having an otherwise procedural / imperative language with a parallel-begin/end construct? I remember this idea from my BS 20 odd years ago, but nothing seems to have implemented it. Hide all the thread/process split and rejoin stuff, with data-pass conventions...
Roboprog
Fortran 90 has parrallel processing (at least at the level of matrix arithmatic) built into the language these days.
James Anderson
You want to look at some of the MPI stuff, check out OpenMP for an example of exactly what you're thinking of.
gbjbaanb
A: 

No, I'm not worried.

My work is a little unusual and possibly parallelises more easily than average, but regardless I see it as more of an opportunity than a problem.

Partly I'm impatient for things to get to the point where it's really worth optimising for multicore. I don't know what the exact numbers are at the moment, but it seems like half our clients have a single-core machine, 49% have dual core and maybe 1% have quad. That means that multithreading doesn't really give that a huge performance gain in most cases and hence isn't really worth spending much time on.

In a few years time, when the average might be quad-core, there's going to be a lot more case for spending a bit of time on clever multithreading code - which I think is going to be a good thing for us developers. All we need is for Intel and AMD to hurry up and make more of them... :-)

Peter
A: 

One of my hardware-oriented professor tells us (well, preaches), that this is a massively important area of Computer Science. More so, it'll be addressed either by the OS (I noticed Apple is hitting this strong, MS is probably as well), or the coder himself will need to be thinking about parallel execution (threading, etc...).

Quite a neat area of CS. :)

Rev316
A: 

As an indie game developer I'm actually very excited about it. Several games go CPU bound during active moments. And almost all modern 3D games are very taxing on the hardware. Multicore has been the law of the land for video for the past several years. With some nvidia cards nowadays having over 200 cores.

Writing shaders for these cards is a pleasure, and I can't wait to see what comes out of more and more machines being multi-proc.

I think this need will spawn better threading support over time. We still have crazy schemes like apaches MPM-Worker model where you get a mix of several processes and threads at the same time. I'd like to see better adoption of things like green-threads, where they seem to all be in the same process, but are actually distributed over cores. But of course someone will have to have some breakthrough idea with shared memory to pull that off.

Near term: It's not a big deal unless you're crushing your processor Long term: Better get comfy with locks :)

Trey Stout
All threads run on multiple cores. A process is a container for memory etc, a thread is where the code is being executed. All processes have 1 thread by default. You can have 2 processes and that still counts as multi-threading, usually the threads don't need to interact with each other in these case - like in web apps - but that's just as good as a single process with 2 threads (better, as you gain memory isolation for free)
gbjbaanb
I do think that GFX cards will be the future of MP though - you write your (mostly) single threaded program, and where you need some crunching, you pass the data to a subroutine on the gfx "coprocessor" that flits through it and returns the result to you.
gbjbaanb
A: 

Dataflow programming shows some promise for a relatively easy solution to the multicore problem.

As wikipedia says, though, it requires a fairly major paradigm shift, which seems to prevent its easy adoption by the programming community.

Underflow
+11  A: 

I do not agree with the current accepted answer.

The foremost important aspect of multicore machines is that CPU and main memory are far apart. This means that unless the application is "embarrassingly parallel" or easy to parallelize, it is highly likely that it would be memory bound, rather than CPU bound. A floating point multiplication takes about 4 clock cycles, while a memory fetch from main memory takes hundreds of clock cycles. Therefore, exploiting cache locality becomes important.

For difficult-to-parallelize applications, if the achieved performance on single core is sufficient (majority of the applications would belong the this class), there is no need to parallelize. But if it is not (or your competitor's application is much more responsive since they parallelized), then you would do better to refactor your application to better exploit parallelism and cache locality. Vaguely, the refactored application would consist of relatively independent (or less communicative) submodules, which run in parallel (see this example, for one).

See http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html for a good overview on multicore and the way things are heading. The major points they say are:

  • Clock speed is not increasing anymore as before. It is more cost effective to manufacture more number of slower, simpler cores, than a small number of fast processors.
  • The memory is (increasingly) far from CPU
  • In a few years, there will be 1000s of cores in web servers, 100s on desktops. So plan to scale your application (probably auto-scale) to 100s or 1000s of cores. This means you should create several independent tasks.
  • Threads are difficult to work with, therefore better work with "tasks".
Amit Kumar
One note -- clock speed may not be increasing, but the amount that can be done in one clock cycle is.
rlbond
@rlbond probably you mean more can be done through using more pipeline stages. But this is not - ILP (instruction level parallelism) has diminishing returns too. More can be done using multiple cores, not one.
Amit Kumar
About "memory wall", see http://www.csl.cornell.edu/~sam/papers/cf04.pdf
Amit Kumar
+2  A: 

No way! I'm a Clojure programmer! :D

Rayne
A: 

The thing I've been thinking about, is aren't most divide-and-conquer algorithms massively parallelizable? Every split should be able to be run in two separate threads...

Anyway, I'm concerned when I need to be concerned. When my program starts getting slow, then I'll look for ways to speed it up. Unfortunately, this is a problem in my line of work.

Mark
A: 

Day-to-day I don't think much about multi-core programming, but it's always on my radar.

The biggest problem I've always had with parallel processing is determing what should be parallelized? It's easy to spin-off a thread to background process a file, but can the file processing itself be parallelized?

I think the questions of what can and should be parallelized are answered with complex architectural decisions layered on top of the already complex architectural decisions of the application in general. My belief is that this complexity will be solved either by the OS or by the programming language. The traditional thread model of parallelization found in C and its descendants is not the final answer.

byamabe
+3  A: 

I think this is a great question. So, I've begun a series of blog posts about it here.

Dmckee's answer is correct in the narrowest sense. Let me rephrase in my own words here, implicitly including some of the comments:

There is no value in parallelizing operations that are not CPU bound. There is little value in parallelizing operations that are only CPU bound for short periods of time, say, less than a few hundred milliseconds. Indeed, doing so will most likely cause a program to be more complex, and buggy. Learning how to implement fine grained parallelism is complicated and doing it well is difficult.

That is true as far as it goes, but I belive the answer is richer for a broader set of programs. Indeed, There are many reasons to use multi-threaded, and then implicitly multi-core techniques in your production applications. For example, it is a huge benefit to your users to move disk and network I/O operations off your user interface thread.

This has nothing to do with increasing the throughput of compute bound operations, and everything to do with keeping a program's user interface responsive. Note, you don't need a graphical UI here - command line programs, services, and server based applications, can benefit for this as well.

I completely agree that taking a CPU bound operation and paralyzing it can often be a complex task - requiring knowledge of fine grained synchronization, CPU caching, CPU instruction pipelines, etc. etc. Indeed, this can be classically 'hard'.

But, I would argue that the need to do his is rare; there are just not that many problems that need this kind of fine grained parallelism. Yes! they do exist and you may deal this this every day, but I would argue that in the day to day life of most developers, this is pretty rare.

Even so, there are good reasons to learn the fundamentals of multi-threaded, and thus multi-core development.

  1. It can make your program more responsive from a user perspective by moving longer operations off the message loop thread.
  2. Even for things that are not CPU bound, it can often make sense to do them in parallel.
  3. It can break up complex single threaded state machines into simpler, more procedural code.

Indeed, the OS already does a lot for you here, and you can use libraries that are multi-core enabled (like Intel's stuff). But, operating systems and libraries are not magic - I argue that it is valuable for most develops to learn the basics of multi-threaded programming. This will let you write better software that your users are happier with.

Of course, not every program should be multi-threaded, or multi-core enabled. It is just fine for some things to be implemented in a simple single threaded manner. So, don’t take this as advice that every program should be multi-threaded – use your own good judgment here. But, it can often be a valuable technique and very beneficial in many regards. As mentioned above, I plan on blogging about this a bit starting here. Feel free to follow along and post comments there as you feel inclined

Foredecker
A: 

No. I feel that multicore will make a significant difference in certain areas of programming but will barely affect other areas. After a while the areas it does will absorb it and encapsulate it and the hype will barely touch the other areas.

BubbaT