cpu-architecture

Critical sections with multicore processors

With a single-core processor, where all your threads are run from the one single CPU, the idea of implementing a critical section using an atomic test-and-set operation on some mutex (or semaphore or etc) in memory seems straightforward enough; because your processor is executing a test-and-set from one spot in your program, it necessari...

How does a hardware trap in a three-past-the-end pointer happen even if the pointer is never dereferenced?

In his November 1, 2005 C++ column, Herb Sutter writes ... int A[17]; int* endA = A + 17; for( int* ptr = A; ptr < endA; ptr += 5 ) { // ... } [O]n some CPU architectures, including current ones, the aforementioned code can cause a hardware trap to occur at the point where the three-past-the-end pointer is created, whethe...

Can you recommend any good book about low level programming?

Disclaimer: I'm not a programmer, and I'm really acting as a proxy to ask this question here :) In old days, there was Peter Norton's Programmer's Guide to IBM PC. This book served as invaluable source of information for every person that wanted to write all things low-level. What book nowadays can be recommended as source of knowledge ...

Finding prime factors to large numbers using specially-crafted CPUs

My understanding is that many public key cryptographic algorithms these days depend on large prime numbers to make up the keys, and it is the difficulty in factoring the product of two primes that makes the encryption hard to break. It is also my understanding that one of the reasons that factoring such large numbers is so difficult, is ...

Double(s) across different cpu architectures?

Is it OK to send over network double floating point values (adjusted for correct byte order of course) and using them interchangeably on different cpu architectures, specifically i386, mips (couple of different cores), powerpc (e300, e500). No extremely old hardware. Using gcc 4.2.1 as compiler with -Os for all architectures. Supposedl...

How to determine SSE prefetch instruction size?

I am working with code which contains inline assembly for SSE prefetch instructions. A preprocessor constant determines whether the instructions for 32-, 64- or 128-bye prefetches are used. The application is used on a wide variety of platforms, and so far I have had to investigate in each case which is the best option for the given CPU....

Porting 32 bit C++ code to 64 bit - is it worth it? Why?

I am aware of some the obvious gains of the x64 architecture (higher addressable RAM addresses, ect)... but: What if my program has no real need to run in native 64 bit mode. Should I port it anyway? Are there any foreseeable deadlines for ending 32 bit support? Would my application run faster / better / more secure as native x64 code?...

CPU Numbering on a hypertheading enabled system

Hi, I am trying to find out how an OS (Windows, linux) assigns numbers to logical cpus in a Hyper threading enabled environment. ? Does both the OSs first serially assign numbers to the Physical CPUs and then start numbering the logical cpus or is there some other rule followed.. ? e.g. in 2 physical cpu system with hyper threading , d...

System Architecture

How do I determine whether the currently running Mac OS X system is of 32bit or 64bit machine? ...

How Windows Portable Executables are portable across machine architecture

Is Windows Portable Executables are really portable across machine architectures? If so how it works? If not then what does "Portable Executable" mean or which part of executable is portable? Thanks, Siva Chandran ...

Design code to fit in CPU Cache?

When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main memory. Is it possible to specify that you want the program to run from cache or at least load the variables into cache? We are writing si...

C programming and error_code variable efficiency

Most code I have ever read uses a int for standard error handling (return values from functions and such). But I am wondering if there is any benefit to be had from using a uint_8 will a compiler -- read: most C compilers on most architectures -- produce instructions using the immediate address mode -- i.e., embed the 1-byte integer into...

Preserving the Execution pipeline

Return types are frequently checked for errors. But, the code that will continue to execute may be specified in different ways. if(!ret) { doNoErrorCode(); } exit(1); or if(ret) { exit(1); } doNoErrorCode(); One way heavyweight CPU's can speculate about the branches taken in near proximity/locality using simple statistics - I...

Intel has just unveiled a new 48 core CPU. What will this move to many cores imply for us programmers?

Intel has just unveiled a new 48 core CPUs. More than just the number of cores, this new architecture seems to introduce a lot of interesting features, such as this one: Things get interesting here - Intel is saying that they have removed hardware cache coherency which effectively means each "tile" will be completely separate in what...

Compiling for both Intel and PPC CPUs on OSX

I have a MacBook Pro with a 64-bit Intel Core 2 Duo processor, and I'm using gcc (i686-apple-darwin9-gcc-4.0.1) to compile executables which I can run ok on my own machine. Recently someone tried to run my application on a PowerBook G4 and got a 'Bad CPU type in executable' error, which I think is because their CPU is PPC rather than Int...

struct size optimization

I have a structure I would like to optimize the footprint of. typedef struct dbentry_s { struct dbentry_s* t_next; struct dbentry_s* a_next; char *t; char *a; unsigned char feild_m; unsigned char feild_s; unsigned char feild_other; } dbentry; As I understand it, the compiler creates structures in memory as you de...

CPU Registers and Cache Coherence

What's the relation between CPU registers and CPU cache when it comes to cache coherence protocols such as MESI? If a certain value is stored in the CPU's cache, and is also stored in a register, then what will happen if the cache line will be marked as "dirty"? to my understanding there is no gurentee that the register will update it's ...

Detect CPU Architecture (32-bit / 64-bit) runtime in Objective C (Mac OS X)

I'm currently wring a Cocoa application which needs to execute some (console) applications which are optimized for 32 and 64 bit. Because of this I would like to detect what CPU architecture the application is running on so I can start the correct console application. So in short: how do I detect if the application is running on a 64 bi...

Right way to detect cpu architecture?

I'm attempting to detect the right cpu architecture for installing either a x86 msi or x64 msi file. If I'm right, for the msi I need the os cpu architecture I'm not totally sure if my way is right because I can't test it. What do you think? private static string GetOSArchitecture() { string arch = System.Environment.Get...

How instructions are differentiated from data?

Hi, While reading ARM core document, I got this doubt. How does the CPU differentiate the read data from data bus, whether to execute it as an instruction or as a data that it can operate upon? Refer to the excerpt from the document - "Data enters the processor core through the Data bus. The data may be an instruction to execu...