views:

705

answers:

4

I have a multithreaded c++ application that runs on Windows, Mac and a few Linux flavours. To make a long story short: Inorder for it to run at maximum efficiency I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active. My question therefore is if there is a way to detect whether hyperthreading is supported AND ENABLED? If so, how exactly.

A: 

I don't know that all three expose the information in the same way, but if you can safely assume that the NT kernel will report device information according to the POSIX standard (which NT supposedly has support for), then you could work off that standard.

However, differing of device management is often cited as one of the stumbling blocks to cross platform development. I would at best implement this as three strands of logic, I wouldn't try to write one piece of code to handle all platforms evenly.

Ok, all that's assuming C++. For ASM, I presume you'll only be running on x86 or amd64 CPUs? You'll still need two branch paths, one for each architecture, and you'll need to test Intel separate from AMD (IIRC) but by and large you just check for the CPUID. Is that what you're trying to find? The CPUID from ASM on Intel/AMD family CPUs?

drachenstern
A: 

Windows only solution desribed here:

http://msdn.microsoft.com/en-us/library/ms683194

for linux, /proc/cpuinfo file. I am not running linux now so can't give you more detail. You can count physical/logical processor instances. If logical count is twice as physical, then you have HT enabled (true only for x86).

rados
+1  A: 

Do you know boost?, assuming you are using C++ I would do it this way:

#include <iostream>
#include <boost/thread.hpp>

int main()
{
    std::cout << boost::thread::hardware_concurrency();
    return 0;
}
brubelsabs
This is a very simple solution but it does not differentiate hardware threads, a.k.a. hyper-threads, from physical CPUs or cores which I think is the point of this question.
jcoffland
Yes you are right I missed this detail, so should I delete my post?
brubelsabs
+1  A: 

The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.

I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.

In short here are the steps using the CPUID instruction:

  1. Detect CPU vendor using CPUID function 0
  2. Check for HTT bit 28 in CPU features EDX from CPUID function 1
  3. Get the logical core count from EBX[23:16] from CPUID function 1
  4. Get actual non-threaded CPU core count
    1. If vendor == 'GenuineIntel' this is 1 plus EAX[31:26] from CPUID function 4
    2. If vendor == 'AuthenticAMD' this is 1 plus ECX[7:0] from CPUID function 0x80000008

Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:

#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Detect hyper-threads  
  bool hyperThreads = false;
  if (cpuVendor == "GenuineIntel" && cpuFeatures & (1 << 28)) { // HTT bit
    // Logical core count per CPU
    cpuID(1, regs);
    unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
    cout << " logical cpus: " << logical << endl;
    unsigned cores = logical;

    if (cpuVendor == "GenuineIntel") {
      // Get DCP cache info
      cpuID(4, regs);
      cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

    } else if (cpuVendor == "AuthenticAMD") {
      // Get NC: Number of CPU cores - 1
      cpuID(0x80000008, regs);
      cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
    }

    cout << "    cpu cores: " << cores << endl;

    if (cores < logical) hyperThreads = true;
  }

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.

Here is the output on a few different Intel machines:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

 logical cpus: 2
    cpu cores: 2
hyper-threads: false

Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:

 logical cpus: 4
    cpu cores: 4
hyper-threads: false

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz:

 logical cpus: 16
    cpu cores: 8
hyper-threads: true

Intel(R) Pentium(R) 4 CPU 3.00GHz:

 logical cpus: 2
    cpu cores: 1
hyper-threads: true
jcoffland