views:

1382

answers:

4

I would like to programatically disable hardware prefetching.

From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and How to Choose between Hardware and Software Prefetch on 32-Bit Intel® Architecture, I need to update the MSR to disable hardware prefetching.

Here is a relevant snippet:

"DPL Prefetch and L2 Streaming Prefetch settings can also be changed programmatically by writing a device driver utility for changing the bits in the IA32_MISC_ENABLE register – MSR 0x1A0. Such a utility offers the ability to enable or disable prefetch mechanisms without requiring any server downtime.

The table below shows the bits in the IA32_MISC_ENABLE MSR that have to be changed in order to control the DPL and L2 Streaming Prefetch:

Prefetcher Type MSR (0x1A0) Bit Value 
DPL (Hardware Prefetch) Bit 9 0 = Enable 1 = Disable 
L2 Streamer (Adjacent Cache Line Prefetch) Bit 19 0 = Enable 1 = Disable"

I tried using http://etallen.com/msr.html but this did not work. I also tried using wrmsr in asm/msr.h directly but that segfaults. I tried doing this in a kernel module ... and killed the machine.

BTW - I am using kernel 2.6.18-92.el5 and it has MSR linked in the kernel:

$ grep -i msr /boot/config-$(uname -r)
CONFIG_X86_MSR=y
...

Any help would be greatly appreciated.

+3  A: 

From the Intel reference:
This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.

...
The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction.

So, your fault might be related to a cpu that doesn't support MSRs or using the wrong MSR address.

There are lots of examples of using the MSRs in the kernel source:

In the kernel source, for a single cpu, it demonstrates disabling prefetch for the Xeon in arch/i386/kernel/cpu/intel.c, in the function:

static void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c)

The rdmsr function arguments are the msr number, a pointer to the low 32 bit word, and a pointer to the high 32 bit word.
The wrmsr function arguments are the msr number, the low 32 bit word value, and the high 32 bit word value.

multi-core or smp systems have to pass the cpu struct in as the first argument:
void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

Chris
It seems that my kernel (2.6.18-92.el5) does not have rdmsr_on_cpu or wrmsr_on_cpu in msr.h. Was this added in 2.6.19?
Carlos
I forget exactly, but that sound about right.
Chris
It was right after 2.6.18 was chosen for Debian, the patch was introduced in january-2007 according to lkml.org: http://lkml.org/lkml/2007/1/18/91
Chris
+5  A: 

You can enable or disable the hardware prefetchers using msr-tools http://www.kernel.org/pub/linux/utils/cpu/msr-tools/.

The following enables the hardware prefetcher (by unsetting bit 9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2089 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2089

The following disables the hardware prefetcher (by enabling bit 9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2289 
[root@... msr-tools-1.2]# ./rdmsr 0x1a0 
60628e2289

Programatically, you can do this as root by opening /dev/cpu//msr and using pwrite to write to the msr "file" at the 0x1a0 offset.

Carlos
A: 

hi,
I exactly did what mentioned below. Using msr-tools & wrmsr and rdmsr commands. I have enabled 9th bit in ox1A0 and verified that by rdmsr too.

But the problem is: there is no difference in L1 data cache hit-ratio(PAPI_L1_DCH/PAPI_L1_DCA) when measured using PAPI.
Even i have disabled these prefetches

  • 39 ( IP Prefetcher Disable)
  • 37 ( DCU Prefetcher Disable)
  • 19 ( Adjacent cacheline prefetch disable)
  • 9 ( Hardware Prefetcher Disable)

too.

root@phani-laptop:~# rdmsr -X 0x1A0
1364972489
root@phani-laptop:~# wrmsr -p0 0x1A0 0xB3649F2689
root@phani-laptop:~# wrmsr -p1 0x1A0 0xB3649F2689
root@phani-laptop:~# rdmsr -X 0x1A0
B3649F2689
root@phani-laptop:~# rdmsr -p1 -X 0x1A0
B3649F2689

But, no change in L1 cache hit ratio before & after disabling as per PAPI.

Any ideas on why it is happening so?

Thanks,
Phani.

P.S:

  1. Processor: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
  2. OS: Linux phani-laptop 2.6.32.11+drm33.2-perfctr #1 SMP Mon May 17 12:59:49 IST 2010 i686 GNU/Linux
Phani Deepak
It's an **L2** prefetcher, meaning that it will pull in lines from memory into L2, not L1. Your L1 hit rate should therefore not change, only the L2 hit rate.
Wim
A: 

Hi, I actually have the same problem as Phani Deepak. I change the values of the register 0x1A0, but I see no changes on the L1 misses when measured with PAPI. I have similar processor as he does.

Thanks.

Felipe
Have a look at Carlos's own answer to his question. It seems to work for him.
Peter G.