views:

706

answers:

8

Hi folks,

the other day a colleague of mine stated that using static classes can cause performance issues on multi-core systems, because the static instance cannot be shared between the processor caches. Is that right? Are there some benchmarks around proofing this statement? This statement was made in the context of .Net development (with C#) related discussion, but it sounds to me like a language and environment independent problem.

Thx for your comments.

+2  A: 

If you don't use any kind of locks or synchronization then static-vs.-non-static won't have any influence on your performance.

If you're using synchronization then you could run into a problem if all threads need to acquire the same lock, but that's only a side-effect of the static-ness and not a direct result of the methods being static.

Joachim Sauer
+13  A: 

I would push your colleague for data or at least references.

The thing is, if you've got shared data, you've got shared data. Whether that's exposed through static classes, a singleton, whatever, isn't terribly important. If you don't need the shared data in the first place, I expect you wouldn't have a static class anyway.

Besides all of this, in any given application there's likely to be a much bigger bottleneck than processor caches for shared data in static classes.

As ever, write the most sensible, readable, maintainable code first - then work out if you have a performance bottleneck and act accordingly.

Jon Skeet
+3  A: 

"[a] static instance cannot be shared between the processor caches. Is that right?"

That statement doesn't make much sense to me. The point of each processor's dedicated cache is that it contains a private copy of a small patch of memory, so that if the processor is doing some algorithm that only needs to access that particular memory region then it doesn't have to go to keep going back to access the external memory. If we're talking about the static fields inside a static class, the memory for those fields may all fit into a contiguous chunk of memory that will in turn fit into a single processor's (or core's) dedicated cache. But they each have their own cached copy - it's not "shared". That's the point of caches.

If an algorithm's working set is bigger than a cache then it will defeat that cache. Meaning that as the algorithm runs, it repeatedly causes the processor to pull data from external memory, because all the necessary pieces won't fit in the cache at once. But this is a general problem that doesn't apply specifically to static classes.

I wonder if your colleague was actually talking not about performance but about the need to apply correct locking if multiple threads are reading/writing the same data?

Daniel Earwicker
Makes much sense to me. He indeed was talking about performance.
Prensen
So what did *you* think he meant? :) What does it mean to share an instance between caches?
Daniel Earwicker
Well he meant - if I did understand him right - that the static instance of that class can only live in one processor cache at time. But that makes no sense to me.
Prensen
+2  A: 

In any "virtual machine" controlled language (.NET, Java, etc) this control is likely delegated to the underlying OS and likely further down to the BIOS and other scheduling controls. That being said, in the two biggies, .NET and Java, static vs. non-static is a memory issue, not a CPU issue.

Re-iterating saua's point, the impact on the CPU comes from the synchronization and thread control, not the access to the static information.

The problem with CPU cache management is not limited to only static methods. Only one CPU can update any memory address at a time. An object in your virtual machine, and specifically a field in your object, is a pointer to said memory address. Thus, even if I have a mutable object Foo, calling setBar(true) on Foo will only be allowed on a single CPU at a time.

All that being said, the point of .NET and Java is that you shouldn't be spending your time sweating these problems until you can prove that you have a problem and I doubt you will.

bangroot
Ok. Thanks a lot!
Prensen
if you think the BIOS is involved in scheduling and cache management, you have to reread a lot...
Javier
Yeah BIOS has nothing to do with anything here. Once the OS is loaded, it is in full control and it is the only true process that is running, everything else is merely an abstraction that exists within its boundaries.
ApplePieIsGood
LOL - good catch. I guess MMU would be what I'm thinking of?
bangroot
A: 

Even if it were true, I suspect you have plenty of better ways to improve performance. When it gets down to changing static to instance, for processor caching, you'll know you are really pushing the envelope.

Greg Dean
+3  A: 

If multiple threads are writing to that data, you'll have cache thrashing (the write on one CPU's cache invalidates the caches of the other CPUs). Your friend is technically correct, but there's a good chance it's not your primary bottleneck, so it doesn't matter.

If multiple threads are reading the data, your friend is flat-out wrong.

Tom
Yes. Very extremely true. And as Jon Skeet pointed out: Why on earth would you worry about this? You better be sure your code is not just CPU bound, but bound by this cache effect within the CPU before you decide to eliminate a useful static class. And then, at that point, you might want to use C.
PeterAllenWebb
Why would C alleviate the problem? The issue would be related to memory architecture, the language used to represent the shared memory will be irrelevant by the time its converted from the ISA down further into micro-code.
ApplePieIsGood
C gives you explicit control over your entire memory model, including the alignment and padding concerns for cache thrashing. This absolutely does not guarantee that you'll do any better with a C solution, but it does increase your performance ceiling if you can do a better job than the VM/JIT.
Tom
+1  A: 
  1. if you share mutable data between threads, you need either a lock or a lock-free algorithm (seldom available, and sometimes hard to use, unfortunately).
  2. having few, widely used, lock-arbitrated resources can get you to bottlenecks.
  3. static data is similar to a single-instance resource.

therefore:

  • if many threads access static data, and you use a lock to arbitrate, your threads are going to fight for access.

when designing a highly multithreaded app, try to use many fine-grained locks. split your data so that a thread can grab one piece and run with it, hopefully no other thread will need to wait for it because they're busy with their own pieces of data.

Javier
Also consider that as the number of threads grow, so does the cost of the context switches, and that will at some point threaten to overtake the cost of cache-dirtying inefficiency of the CPU/memory subsystem. It's just too subjective to speculate without a more concrete understanding of the app.
ApplePieIsGood
an O(1) scheduler shouldn't care about number of threads. i think most *nix systems these days qualify
Javier
+1  A: 

x86 architecture implements cache-snooping to keep data caches in sync on writes should they happen to cache the same thing... Not all architectures do that in hardware, some depend on software to make sure that the case never occurs.

Brian Knoblauch