cpu-cache

Can I force cache coherency on a multicore x86 CPU?

The other week, I wrote a little thread class and a one-way message pipe to allow communication between threads (two pipes per thread, obviously, for bidirectional communication). Everything worked fine on my Athlon 64 X2, but I was wondering if I'd run into any problems if both threads were looking at the same variable and the local ca...

L1 memory cache on Intel x86 processors

I am trying to profile and optimize algorithms and I would like to understand the specific impact of the caches on various processors. For recent Intel x86 processors (e.g. Q9300), it is very hard to find detailed information about cache structure. In particular, most web sites (including Intel.com) that post processor specs do not inc...

Cache efficient code

This could sound a subjective question, but what i am looking for is specific instances which you would have encountered related to this. How to make a code, cache effective-cache friendly? (More cache hits, as less cahce misses as possible). from both perspectives, data cache & program cache(instruction cache). i.e. What all things in...

Cache memories in Multicore CPUs

Hello, I have few questions regarding Cache memories used in Multicore CPUs or Multipprocessors systems. (Although not directly related to programming, it has many repurcussions while one writes software for multicore processors/multiprocessors systems, hence asking here!) 1.) In a multiprocessor system or a multicore processor(Intel Q...

read CPU cache contents

Hi, Is there any way to read the CPU cache contents? Architecture is for ARM. I m invalidating a range of addresses and then want to make sure whether it is invalidated or not. Although I can do read and write of the range of addresses with and without invalidating and checking the invalidation, I want to know whether it is possible to ...

Is it possible to lock some data in CPU cache?

I have a problem.... I'm writing a data into array in the while-loop. And the point is that I'm doing it really frequently. It seems to be that this writing is now a bottle-neck in the code. So as i presume it's caused by the writing to memory. This array is not really large (smth like 300 elements). The question is it possible to do it ...

CPU cache flush

I am interested in forcing a CPU cache flush in Windows (for benchmarking reasons, I want to emulate starting with no data in cpu cache), preferably a basic C implementation or win32 call. Is there a known way to do this with a system call or even something as sneaky as doing say a large memcopy? Intel i686 platform (P4 and up is okay as...

Design code to fit in CPU Cache?

When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main memory. Is it possible to specify that you want the program to run from cache or at least load the variables into cache? We are writing si...

C++ cache aware programming

Hi! is there a way in C++ to determine the CPU's cache size? i have an algorithm that processes a lot of data and i'd like to break this data down into chunks such that they fit into the cache. Is this possible? Can you give me any other hints on programming with cache-size in mind (especially in regard to multithreaded/multicore data p...

CPU Registers and Cache Coherence

What's the relation between CPU registers and CPU cache when it comes to cache coherence protocols such as MESI? If a certain value is stored in the CPU's cache, and is also stored in a register, then what will happen if the cache line will be marked as "dirty"? to my understanding there is no gurentee that the register will update it's ...

Invalidating the CPU's cache

When my program performs a load operation with acquire semantics/store operation with release semantics or perhaps a full-fence, it invalidates the CPU's cache. My question is this: which part of the cache is actually invalidated? only the cache-line that held the variable that I've used acquire/release? or perhaps the entire cache is in...

Cache bandwidth per tick for modern CPUs

Hello What is a speed of cache accessing for modern CPUs? How many bytes can be read or written from memory every processor clock tick by Intel P4, Core2, Corei7, AMD? Please, answer with both theoretical (width of ld/sd unit with its throughput in uOPs/tick) and practical numbers (even memcpy speed tests, or STREAM benchmark), if any....

Does the Java Memory Model (JSR-133) imply that entering a monitor flushes the CPU data cache(s)?

There is something that bugs me with the Java memory model (if i even understand everything correctly). If there are two threads A and B, there are no guarantees that B will ever see a value written by A, unless both A and B synchronize on the same monitor. For any system architecture that guarantees cache coherency between threads, the...

Does this code fill the CPU cache?

I have two ways to program the same functionality. Method 1: doTheWork(int action) { for(int i = 0 i < 1000000000; ++i) { doAction(action); } } Method 2: doTheWork(int action) { switch(action) { case 1: for(int i = 0 i < 1000000000; ++i) { doAction<1>(); } ...

How do you profile a .net application taking into account the effect of the CPU cache?

All the .net profilers I know don’t take into the account the effect of the CPU cache. Given that reading a field from the CPU cache can be 100 faster than reading it from main memory, it can be a big factor. (I just had to explain this in an answer) I have seen too many people spend a long timer speeding up loops that a profiler sa...

How to clear CPU L1 and L2 cache

I'm running a benchmark on xeon server , and i repeat the executions 2-3 times. I'd like to erase the cache contents in L1 and L2 while repeating the runs. Can you suggest any methods for doing so ? ...

Are CPU registers and CPU cache different?

Are CPU registers and CPU cache different? ...