Today is a good day, so in C and a tiny bit of assembly I'm building a lock-free allocator for a parallel traits based language. But then I'm between contracts at the moment, so can afford to play with things I can only understand on a good day.
Professionally I use C++ or Java, depending mostly on what the rest of the team I'm with is happiest with, and appropriate libraries.
This is the one area which garbage collection significantly helps. When moving an existing system to exploit multiple cores, one written in a garbage collected language compared to one using RAII or simple reference counting implementations (without CAS), the gc one is much easier if the ownership of objects can move between threads.