cpu-architecture

Why is my C++ app faster than my C app (using the same library) on a Core i7

I have a library written in C and I have 2 applications written in C++ and C. This library is a communication library, so one of the API calls looks like this: int source_send( source_t* source, const char* data ); In the C app the code does something like this: source_t* source = source_create(); for( int i = 0; i < count; ++i ) ...

Do you expect that future CPU generations are not cache coherent?

I'm designing a program and i found that assuming implicit cache coherency make the design much much easier. For example my single writer (always the same thread) multiple reader (always other threads) scenarios are not using any mutexes. It's not a problem for current Intel CPU's. But i want this program to generate income for at leas...

How does a register machine differ from a stack machine?

How does a register machine differ from a stack machine? ...

Cache bandwidth per tick for modern CPUs

Hello What is a speed of cache accessing for modern CPUs? How many bytes can be read or written from memory every processor clock tick by Intel P4, Core2, Corei7, AMD? Please, answer with both theoretical (width of ld/sd unit with its throughput in uOPs/tick) and practical numbers (even memcpy speed tests, or STREAM benchmark), if any....

Load half word and load byte in a single cycle datapath

There was this problem that has been asked about implementing a load byte into a single cycle datapath without having to change the data memory, and the solution was something below. This is actually quite a realistic question; most memory systems are entirely word-based, and individual bytes are typically only dealt with i...

understanding memory address

I am having problems with memory addressing in MIPS. It says that the addressing is word aligned... in the text below I don't understand why it's looking at the 2 least significant bits of the address? why? can someone give me an example to clarify/illustrate the point made here... so is it saying that a valid halfword address are all wh...

P6 Architecture - Register renaming aside, does the limited user registers result in more ops spent spilling/loading?

I'm studying JIT design with regard to dynamic languages VM implementation. I haven't done much Assembly since the 8086/8088 days, just a little here or there, so be nice if I'm out of sorts. As I understand it, the x86 (IA-32) architecture still has the same basic limited register set today that it always did, but the internal register...

Mips Data layout calculation

I am self studying computer architecture offer at Michigan university. I do not understand why the memory layout for d . ( http://www.flickr.com/photos/45412920@N03/4442695706/ ) Maybe I did not understanding the here( http://www.flickr.com/photos/45412920@N03/4441916461/sizes/l/ ) well. ...

branch prediction

Consider the following sequence of actual outcomes for a single static branch. T means the branch is taken. N means the branch is not taken. For this question, assume that this is the only branch in the program. T T T N T N T T T N T N T T T N T N Assume a two-level branch predictor that uses one bit of branch history—i.e., a one-bit B...

Difference between "machine hardware" and "hardware platform"

My Linux machine reports "uname -a" outputs as below:- [root@tom i386]# uname -a Linux tom 2.6.9-89.ELsmp #1 SMP Mon Apr 20 10:34:33 EDT 2009 i686 i686 i386 GNU/Linux [root@tom i386]# As per man page of uname, the entries "i686 i686 i386" denotes:- machine hardware name (i686) processor type (i686) hardware platform (i386) Additi...

Detecting architecture at compile time from MASM/MASM64

How can I detect at compile time from an ASM source file if the target architecture is I386 or AMD64? I am using masm(ml.exe)/masm64(ml64.exe) to assemble file32.asm and file64.asm. It would be nice to create a single file, file.asm, which should include either file32.asm, or file64.asm, depending on the architecture. Ideally, I would l...

question about jump in MIPS

What does the PCGPRLEN-1..28 means here?? Where does this 4 bit comes from? ...

How can I dual boot my iphone or ipad to run a very simple custom os?

I am an experienced C/C++ programmer and have worked with assembly and many other programing language and I want to start a project as a learning process. I want to try and run a simple custom os on the iphone or ipad. What knowledge would I need to do this, and how does the iphone or ipad bootloader load the os and how could I modify it...

Where in the Fetch-Execute cycle is a value via an address mode decoded

I'm currently building a small CPU interpreter that has support several addressing modes, including register-deferred and displacement. It utilizes the classic IF-ID-EX-MEM-WB RISC-pipeline. In what stage of the pipeline is the value for an address-moded operand decoded. For example: addw r9, (r2), 8(r3) In what stage is (r2) and 8(...

Cycles/byte calculations

Hi ! In Crypto communities it is common to measure algorithm performance in cycles/byte. My question is, which parameters in the CPU architecture are affecting this number? Except the clockspeed ofcourse :) ...

Why are there only four registers?

Why are there only four registers in the most common CPU (x86)? Wouldn't there be a huge increase in speed if more registers were added? When will more registers be added? ...

cache memory performance

Hello, i just have a general question about cache memory. How would a program perform badly on a cache based system ? , since cache memory stores adresses from main memory that is requested, aswell as adresses that ranges around the same adress as the one copied from the main memory. ...

What kind of data processing problems would CUDA help with?

Hi, I've worked on many data matching problems and very often they boil down to quickly and in parallel running many implementations of CPU intensive algorithms such as Hamming / Edit distance. Is this the kind of thing that CUDA would be useful for? What kinds of data processing problems have you solved with it? Is there really an upl...

Cache consistency & spawning a thread

Background I've been reading through various books and articles to learn about processor caches, cache consistency, and memory barriers in the context of concurrent execution. So far though, I have been unable to determine whether a common coding practice of mine is safe in the strictest sense. Assumptions The following pseudo-code i...

Is adding a new CPU to qemu a huge task?

Since qemu can emulate hardware, CPU's in particular as I understand it, would it be a huge undertaking to add a new architecture? What I had in mind were the XBox 360 CPU called Xenon. I think there are unofficial specs. Adding the Xenon, would that be something one medium good programmer could do? ...