Here's how to write a lock-free queue in C++:
http://www.ddj.com/hpc-high-performance-computing/210604448
But when you say "thread A must not block", are you sure that's the requirement? Windows is not a real-time operating system (and neither is linux, in normal use). If you want Thread A to be able to use all available system memory, then it needs to allocate memory (or wait while someone else does). The OS itself cannot provide timing guarantees any better than those you'd have if both reader and writer took an in-process lock (i.e. a non-shared mutex) in order to manipulate the list. And the worst-case of adding a message is going to have to go to the OS to get memory.
In short, there's a reason those queues you don't like have a fixed capacity - it's so that they don't have to allocate memory in the supposedly low-latency thread.
So the lock-free code will generally be less block-y, but due to the memory allocation it isn't guaranteed to be, and performance with a mutex shouldn't be all that shabby unless you have a truly huge stream of events to process (like, you're writing a network driver and the messages are incoming ethernet packets).
So, in pseudo-code, the first thing I'd try would be:
Writer:
allocate message and fill it in
acquire lock
append node to intrusive list
signal condition variable
release lock
Reader:
for(;;)
acquire lock
for(;;)
if there's a node
remove it
break
else
wait on condition variable
endif
endfor
release lock
process message
free message
endfor
Only if this proves to introduce unacceptable delays in the writer thread would I go to lock-free code, (unless I happened to have a suitable queue already lying around).