views:

51

answers:

1

I have a multi-threaded app which is running under linux 2.6.30 on an 8 core PPC processor. I want to determine which CPU is being used when a thread is launched. The obvious answer is to print the PID - processor ID special purpose register. The PID register is accessed using the mfspr instruction. I try to access it using the following asm in my C program:

asm(" mfspr %0, 286 " : "=r" (cpu_no));

The problem is that mfspr is a privileged instruction, and even when this app is run as root, it faults with an illegal instruction error. This instruction works fine when executed on a bare metal app.

While it is possible to create a driver which would execute this instructions in kernel-space, by the time the answer got back to the thread it might have moved to a different core.

From a linux user level process is there any way to get the id of the core that the current thread is running on?

+1  A: 

Will pthread_getaffinity_np or sched_getcpu suffice?

Duck
If my understanding is correct, pthread_getaffinity_np would return a bit map of the cores on which the thread could be scheduled to run. I want to know which core it is running on.
Russ
I was thinking you could set affinity before you launch the thread. sched_getcpu might be closer to what you really want.
Duck
The fundamental issue I am trying to figure out is why the app runs at only 12% of the CPU (as reported by top). I installed LTTng - and for 41 of 42 threads it reports that the CPUID is the same (varies from run to run, but it looks like 41 threads are all bunched up on one core). However, the flow control viewer in lttv shows 8 threads all in run mode concurrently, which is what I would hope happens.
Russ
Id like to have linux figure out where the threads should go - rather than do it manually. First I want to know (definitively) if everything is on one core - or if they are properly distributed across cores.
Russ
D'oh - read the whole answer!! sched_getcpu is *exactly* what I am looking for. Thanks!!
Russ
As a follow up - Linux is correctly distributing the threads across multiple cores - LTTng is attaching an incorrect coreid to the events it stores in it's trace database. The cause of the poor processor utilization is not a thread scheduling problem - but I'm still in the dark as to what the performance problem is.
Russ
Good to hear one problem is down. You don't have 40 threads waiting for the same mutex, right?
Duck
When running the same app on my desktop machine (quad core i7) under Ubuntu or windows I get ~75 to 80% utilization. On the Freescale part I have more cores, but lower utilization. The 40 threads are sharing 20 data areas which are individually locked - so in theory on average each thread will be waiting for just one other thread to get out of the way. I'm going to try limiting the app to just 4 cores and see if I get the same throughput as the intel quad-core. In any case, an interesting problem - just really hard to see whats going on. Any logging or printing grossly changes the behavior
Russ