views:

55

answers:

5

Why can't we move data directly from a memory location into another memory location.

Pardon me if I am asking a dumb question, but I think this is a true situation, at least for the ones I've encountered (8085,8086 n 80386)

I am not really looking for a solution for moving the data (like for eg, using movs n all), but actually the reason for this anomaly.

+1  A: 

Most CPU varieties don't allow memory-to-memory moves. Normally the CPU can access only one memory location at at time, which means you need a temporary spot to store the value when moving it (a general purpose register, usually). If you think about it, moving directly from one memory location to another would require that the CPU be able to access two different spots in RAM simultaneously - that means two full memory controllers at least, and even then, the chances they'd "play nice" enough to access the same RAM would be pretty bad. The chip designers might have been able to pull some tricks to allow direct copies from one RAM chip to another, but that would be a pretty special-application kind of feature that would just add cost and complexity to solve a very uncommon problem.

You might be able to use some special DMA hardware to make it look to your program like memory is being moved without that temporary storage, at least from the perspective of your CPU.

Carl Norum
Doesn't require two simultaneous accesses. You have to read a location before you write it someplace else. Yes, you could issue a read, and *hold* the read memory location "open" while you wrote to the target, but there is zero advantage to doing that. The temporary buffer you suggested is the answer used in all practical circumstances.
Ira Baxter
You could use dual-port RAM and two buses, but the overhead would be just to big for this to be worthwhile.
starblue
doesnt need to be simultaneous reads and writes in the same cycle (which there are various ways to solve that). I read this as why is the memory to memory move missing from some processors. Why require more than one instruction and a register, to perform a copy.
dwelch
+2  A: 

What about MOVS? It moves a 8/16/32-bit value addressed by esi to the location addressed by edi.

Michael Williamson
Interesting. Where does it store the temporary data?
wRAR
+1, x86 is a strange and bizarre beast.
Carl Norum
@wRAR: I suspect it depends on the processor. If I had to guess, I'd say that modern processors break the instruction down into micro-ops that move the value from memory ([esi]) into a physical register, then move the value from the physical register into memory ([edi]) (and then edi/esi are changed). The writer of the x86 code doesn't have to worry about the intermediate register since the programmer only sees the architectural registers, which are distinct from the physical registers (the processor maintains a map to indicate which physical registers correspond to which architectural regs)
Michael Williamson
@wRAR: it stores the temporary data in an internal register you can't see. Lots of transistors means lots of internal resources.
Ira Baxter
+1  A: 

You have one set of address lines, one set of data lines, and a few control lines between the CPU and RAM. You can't physically move directly from memory to memory without a second set of address lines and a whole bunch of complicated logic inside the RAM. Therefore, we have to store it temporarily in a register.

You could make an instruction that does the load and store together and looks like one instruction to the programmer, but there are other considerations like instruction size, non-duplication of effective address calculation logic, pipelining, etc. that make it desirable to keep it more simple.

Karl Bielefeldt
+3  A: 

The basic reason is that most instruction sets allow one register operand, and one memory operand, and sticking to this format makes designing the instruction decoder easier. It also makes the execution engine inside the CPU easier, because the instruction can issue typically a memory operation to just one memory location, and at most one register block read or write.

To do a memory-to-memory instruction directly requires two memory locations to be designated. This is awkward given a register/memory instruction format. Given the performance of the machines, there is little justification for modifying the instruction format just for this.

A hack used by more modern CPUs is to provide some type of block-move instruction, in which the source and destination locations are located in registers (for the X86 this is ESI and EDI respectively). Then an instruction can just designate two registers (or in the case of the x86, instructions that simply know which registers). That solves the instruction decoding problem.

The instruction execution problem is a little harder but people have lots of transistors. Organizing a read indirect from one register, and write indirect through another, and increment both is awkward in silicon but that just chews up some transistors. Now you can have an instruction that moves from memory to memory, just as you asked. One of the other posters noted for the X86 there are instrucitons (MOVB, MOVW, MOVS, ...) that do exactly this, one memory byte/word/... at a time.

Moving a block of memory would be ideal because the CPU can generate high-bandwith reads and writes. The x86 does this with with a REP (repeat) prefix on MOV- to move a larger block.

But if a single insturction can do this, you have the problem that it might take a long time to execute (how long to move 1Gb? --> millions of clock cycles!) and that ruins the interrupt response rate of the CPU.

The x86 solves this by allowing REP MOV- to be interrupted, with the PC being set back to the beginning of the instruction. By updating the registers during the move appropriately, you can interrupt and restart the REP MOV- instruction having both a fast block move and high interrupt response rates. More transistors down the tube.

The RISC guys figured out that all this complexity for a block move instruction was mostly not worth it. You can code a dumb loop (even the x86):

copy: MOV   EAX,[ESI]
      ADD   ESI,4
      MOV   [EDI],EAX
      ADD   EDI,4
      DEC   ECX
      JNE   copy

which does the same basic thing as REP MOV- . Pretty much the modern CPUs (x86, others) execute this so fast (superscalar, etc.) that the bus is just as utilized as the custom move instruction, but now you don't need all those wasted transistors (or corresponding heat).

Ira Baxter
Thanx fr the additional insight. My question was actually arised due to 8 bit 8085 but Well I couldn't create a new tag.
loxxy
So the excuse for the MOV- instructions on early x86s was that they didn't have all that superscalar execution, and the loop I wrote above takes a long time. Having a funny instruction efficiently executed by the CPU thus got you bandwidth. Just that trick isn't needed in big CPUs. And for and 8085, where the idea is to produced the tiniest chip possible, you don't have the spare transistors. What you do in that case is code a block move loop carefully and call it as a subroutine. Not fast, but you already made that choice by using an 8085 anyway.
Ira Baxter
Nailed it on this one, basically not worth the complexity. And for micro controllers you dont want to burn logic and power when you are fast enough already with the simple one memory access per clock, one simple operation per clock kind of thing. I know 8085 was the target but the arm lets to do things like ldr r0,[r1],#4, str r0,[r2],#4 inside a loop is a simple one word at a time copy with the pointer registers (r1, and r2) incrementing by 4 and not requiring extra add instructions. Or better use the ldm/stm to read usually 128 bits per instruction.
dwelch
The extra add instructions for the x86 version are necessary to code. A superscalar x86 will execute them effectively in parallell with the loads or stores, so they're effectively "free", but that's the point of the risc guys: that's a better use of transistors.
Ira Baxter
+1  A: 

Memory-memory machines turn out to be slower in general than load-store machines. This was deduced/figured out/invented by the RISC researchers in 1980ish or so. So the older architectures (VAX/OS360) tend to have memory-memory architectures; newer machines do load-store.

Another interesting variant is stack machines; they seem to always be around as a minority.

Paul Nathan