On the few occassions I have needed to implement my own message queue, I tend to use 1 semaphore and 1 mutex(or semaphore) for each queue. I have only dealt with thread level queues, so this probably doesn't apply if you want a queue between two processes.
The semaphore is used to count the number of messages in the queue and provide the OS mechanism for thread to suspend/wait upon new messages.
The mutex is used to protect the overall queue structure.
So, it might look a bit like this (very much pseudo code):
DataQueueRx( Queue*, WORD*, timeout? )
{
WaitOnSemaphore( Queue->sema, timeout? ); //get token
LockMutex
{
//manipulate your queue, and transfer the data to WORD
}
ReleaseMutex
}
DataQueueTx( Queue*, WORD )
{
LockMutex
{
//manipulate your queue, inserting new WORD msg
ReleaseSemaphore(Queue->sema); //increment semaphore count
}
UnlockMutex
}
However, perhaps this isn't very "light weight". This also assumes that queues are not destroyed while in use. Also, I suspect that with a "WORD" only queue, there could be some optimizations.
If you are seeking "Lock-free code", then I suggest spending a day or two reading through these articles by Sutter.
Good luck!