views:

1198

answers:

13

I'm reviewing some code and feel suspicious of the technique being used.

In a linux environment, there are two processes that attach multiple shared memory segments. The first process periodically loads a new set of files to be shared, and writes the shared memory id (shmid) into a location in the "master" shared memory segment. The second process continually reads this "master" location and uses the shmid to attach the other shared segments.

On a multi-cpu host, it seems to me it might be implementation dependent as to what happens if one process tries to read the memory while it's being written by the other. But perhaps hardware-level bus locking prevents mangled bits on the wire? It wouldn't matter if the reading process got a very-soon-to-be-changed value, it would only matter if the read was corrupted to something that was neither the old value nor the new value. This is an edge case: only 32 bits are being written and read.

Googling for shmat stuff hasn't led me to anything that's definitive in this area.

I suspect strongly it's not safe or sane, and what I'd really like is some pointers to articles that describe the problems in detail.

+2  A: 

I can't believe you're asking this. NO it's not safe necessarily. At the very least, this will depend on whether the compiler produces code that will atomically set the shared memory location when you set the shmid.

Now, I don't know Linux, but I suspect that a shmid is 16 to 64 bits. That means it's at least possible that all platforms would have some instruction that could write this value atomically. But you can't depend on the compiler doing this without being asked somehow.

Details of memory implementation are among the most platform-specific things there are!

BTW, it may not matter in your case, but in general, you have to worry about locking, even on a single CPU system. In general, some device could write to the shared memory.

John Saunders
A: 

Legal? I suppose. Depends on your "jurisdiction". Safe and sane? Almost certainly not.

Edit: I'll update this with more information.

You might want to take a look at this Wikipedia page; particularly the section on "Coordinating access to resources". In particular, the Wikipedia discussion essentially describes a confidence failure; non-locked access to shared resources can, even for atomic resources, cause a misreporting / misrepresentation of the confidence that an action was done. Essentially, in the time period between checking to see whether or not it CAN modify the resource, the resource gets externally modified, and therefore, the confidence inherent in the conditional check is busted.

McWafflestix
+12  A: 

It is legal -- as in the OS won't stop you from doing it.

But is it smart? No, you should have some type of synchronization.

There wouldn't be "mangled bits on the wire". They will come out either as ones or zeros. But there's nothing to say that all your bits will be written out before another process tries to read them. And there are NO guarantees on how fast they'll be written vs how fast they'll be read.

You should always assume there is absolutely NO relationship between the actions of 2 processes (or threads for that matter).

Hardware level bus locking does not happen unless you get it right. It can be harder then expected to make your compiler / library / os / cpu get it right. Synchronization primitives are written to makes sure it happens right.

Locking will make it safe, and it's not that hard to do. So just do it.


@unknown - The question has changed somewhat since my answer was posted. However, the behavior you describe is defiantly platform (hardware, os, library and compiler) dependent.

Without giving the compiler specific instructions, you are actually not guaranteed to have 32 bits written out in one shot. Imagine a situation where the 32 bit word is not aligned on a word boundary. This unaligned access is acceptable on x86, and in the case of the x68, the access is turned into a series of aligned accesses by the cpu.

An interrupt can occurs between those operations. If a context switch happens in the middle, some of the bits are written, some aren't. Bang, You're Dead.

Also, lets think about 16 bit cpus or 64 bit cpus. Both of which are still popular and don't necessarily work the way you think.

So, actually you can have a situation where "some other cpu-core picks up a word sized value 1/2 written to". You write you code as if this type of thing is expected to happen if you are not using synchronization.

Now, there are ways to preform your writes to make sure that you get a whole word written out. Those methods fall under the category of synchronization, and creating synchronization primitives is the type of thing that's best left to the library, compiler, os, and hardware designers. Especially if you are interested in portability (which you should be, even if you never port your code)

James Caccese
Actually this is *wrong*. You say "... all your bits written...NO garuntee's ... ". That's incorrect. There is a garuntee, as the poster remarked, "writes .. shmid" (shmid is a machiene word sized value). And he goes on later to say, " only 32 bits are being written and read.". He is garuntee'd to have syncronized read's and write's to those 32-bit quantities, the CPU will **NOT** read/write like 5 bit's or something and have some other CPU-CORE pick up a word-sized value 1/2 written too. Think about it, that would mean the CORE's are reading/writing by the bit, which is not the case.
RandomNickName42
@unknown - my response has been updated
James Caccese
@RandomNickName42. As James has pointed out. on x86 hardware you do not need to have your data properly aligned. If you don't then the store of the value need not be atomic and can result in "half written" values. I think "Bang!, You're dead" summed it up nicely :o)
ScaryAardvark
I see, ya, I dont know if he added that after to be more clear, but I see what your talking about now ;)
RandomNickName42
+7  A: 

You need locking somewhere. If not at the code level, then at the hardware memory cache and bus.

You are probably OK on a post-PentiumPro Intel CPU. From what I just read, Intel made their later CPUs essentially ignore the LOCK prefix on machine code. Instead the cache coherency protocols make sure that the data is consistent between all CPUs. So if the code writes data that doesn't cross a cache-line boundary, it will work. The order of memory writes that cross cache-lines isn't guaranteed, so multi-word writes are risky.

If you are using anything other than x86 or x86_64 then you are not OK. Many non-Intel CPUs (and perhaps Intel Itanium) gain performance by using explicit cache coherency machine commands, and if you do not use them (via custom ASM code, compiler intrinsics, or libraries) then writes to memory via cache are not guaranteed to ever become visible to another CPU or to occur in any particular order.

So just because something works on your Core2 system doesn't mean that your code is correct. If you want to check portability, try your code also on other SMP architectures like PPC (an older MacPro or a Cell blade) or an Itanium or an IBM Power or ARM. The Alpha was a great CPU for revealing bad SMP code, but I doubt you can find one.

Zan Lynx
You *do* need the LOCK prefix for stuff like atomic increments. OTOH, I don't think ordering is guaranteed; lfence/sfence/mfence instructions are there for a reason, and Linux has rmb()/wmb()/mb() *everywhere*. Memory barriers are cheap. Mutexes are also pretty cheap. Just use them.
tc.
+1  A: 

I actually believe this should be completely safe (but is depends on the exact implementation). Assuming the "master" segment is basically an array, as long as the shmid can be written atomically (if it's 32 bits then probably okay), and the second process is just reading, you should be okay. Locking is only needed when both processes are writing, or the values being written cannot be written atomically. You will never get a corrupted (half written values). Of course, there may be some strange architectures that can't handle this, but on x86/x64 it should be okay (and probably also ARM, PowerPC, and other common architectures).

Zifre
-1 you still need a memory barrier, even if writes are atomic.
tc.
+10  A: 

The problem's actually worse than some of the people have discussed. Zifre is right that on current x86 CPUs memory writes are atomic, but that is rapidly ceasing to be the case - memory writes are only atomic for a single core - other cores may not see the writes in the same order.

In other words if you do

a = 1;
b = 2;

on CPU 2 you might see location b modified before location 'a' is. Also if you're writing a value that's larger than the native word size (32 bits on an x32 processor) the writes are not atomic - so the high 32 bits of a 64 bit write will hit the bus at a different time from the low 32 bits of the write. This can complicate things immensely.

Use a memory barrier and you'll be ok.

Larry Osterman
+1  A: 

I agree that it might work - so it might be safe, but not sane. The main question is if this low-level sharing is really needed - I am not an expert on Linux, but I would consider to use for instance a FIFO queue for the master shared memory segment, so that the OS does the locking work for you. Consumer/producers usually need queues for synchronization anyway.

weismat
A: 

If the shmid has some type other than volatile sig_atomic_t then you can be pretty sure that separate threads will get in trouble even on the very same CPU. If the type is volatile sig_atomic_t then you can't be quite as sure, but you still might get lucky because multithreading can do more interleaving than signals can do.

If the shmid crosses cache lines (partly in one cache line and partly in another) then while the writing cpu is writing you sure find a reading cpu reading part of the new value and part of the old value.

This is exactly why instructions like "compare and swap" were invented.

Windows programmer
This is Linux, not Windows, thus this is IPC and not multithreading.
weismat
This is Linux therefore pthreads exist. But even if pthreads don't exist, I assure you that two processes aren't going to execute in the same thread.
Windows programmer
A: 

Sounds like you need a Reader-Writer Lock : http://en.wikipedia.org/wiki/Readers-writer_lock.

Mark Allanson
+3  A: 

Two processes, two threads, two cpus, two cores all require special attention when sharing data through memory.

This IBM article provides an excellent overview of your options.

Anatomy of Linux synchronization methods Kernel atomics, spinlocks, and mutexes by M. Tim Jones ([email protected]), Consultant Engineer, Emulex

http://www.ibm.com/developerworks/linux/library/l-linux-synchronization.html

DanM
A: 

The answer is - it's absolutely safe to do reads and writes simultaneously.

It is clear that the shm mechanism provides bare-bones tools for the user. All access control must be taken care of by the programmer. Locking and synchronization is being kindly provided by the kernel, this means the user have less worries about race conditions. Note that this model provides only a symmetric way of sharing data between processes. If a process wishes to notify another process that new data has been inserted to the shared memory, it will have to use signals, message queues, pipes, sockets, or other types of IPC.

From Shared Memory in Linux article.

The latest Linux shm implementation just uses copy_to_user and copy_from_user calls, which are synchronised with memory bus internally.

Thevs
"The latest Linux shm implementation just uses copy_to_user and copy_from_user calls". This is just wrong. One of the advantages of shm is you don't need to copy the data into kernelspace and back out again.
user9876
Yes, you don't need. The kernel does. And IT uses copy_to_user, not you.
Thevs
Hehe, got down-voted on absolutely correct answer by some moron....
Thevs
-1. That article definitely sounds wrong. Shared memory is pointless if you need to copy.
tc.
@tc: Are you familiar with kernel internals? Especially with copy_to_user and copy_from_user kernel functions? I suggest reading first before downvoting.
Thevs
+1  A: 

Read Memory Ordering in Modern Microprocessors, Part I and Part II

They give the background to why this is theoretically unsafe.

Here's a potential race:

  • Process A (on CPU core A) writes to a new shared memory region
  • Process A puts that shared memory ID into a shared 32-bit variable (that is 32-bit aligned - any compiler will try to align like this if you let it).
  • Process B (on CPU core B) reads the variable. Assuming 32-bit size and 32-bit alignment, it shouldn't get garbage in practise.
  • Process B tries to read from the shared memory region. Now, there is no guarantee that it'll see the data A wrote, because you missed out the memory barrier. (In practise, there probably happened to be memory barriers on CPU B in the library code that maps the shared memory segment; the problem is that process A didn't use a memory barrier).

Also, it's not clear how you can safely free the shared memory region with this design.

With the latest kernel and libc, you can put a pthreads mutex into a shared memory region. (This does need a recent version with NPTL - I'm using Debian 5.0 "lenny" and it works fine). A simple lock around the shared variable would mean you don't have to worry about arcane memory barrier issues.

user9876
+1  A: 
RandomNickName42