Are machine word size (or smaller) writes serialized? Only one native opcode is needed to copy register content to RAM.
Two CPU's may issue the command at the same time, but doesn't the RAM controller have to process each command it receives individually? So, maybe to the CPU's it is simultaneous, but the RAM controller will determine whose command is processed first.
There is nothing that prevents you from doing this on a low level. RAM writes are atomic however, so memory controller will execute 2 seemingly simulateneous writes from cores sequentially.
They shouldn't because the resulting RAM content would be unspecified if different values were written.
Isn't that native opcode more likely to be writing to the on-CPU cache than directly to RAM?
They can try, but hardware will be the ultimate determinant of what happens.
Writing data to RAM is atomic. If two CPUs try to write to the same location at the same time, the memory controller will decide on some order for the writes. While one CPU is writing to memory, the other CPU will stall for as many cycles as necessary until the first write is completed; then it will overwrite its value. This is what's known as a race condition.
Writes that are smaller than the native word size are not atomic -- in that case, the CPU must read the old memory value into a register, write the new bytes into the register, and then write that new value back to memory.
You should never have code that depends on this -- if you have multiple CPUs that are trying to simultaneously write to the same memory location, you're doing something wrong.
Another important consideration is the cache coherency problem. Each CPU has its own cache. If a CPU writes data to its cache, the other CPUs need to be made aware of the change to that data value if they want to read it.