This can be quite a usual finding, depending on what you're doing and what you're using as an alternative to Locks.
Essentially, what happens is that constructs such as ReentrantLock have some logic built into them that knows "when to back off" when they realistically can't acquire the lock. This reduces the amount of CPU that's burnt just in the logic of repeatedly trying to acquire the lock, which can happen if you use simpler locking constructs.
As an example, have a look at the graph I've hurriedly put up here. It shows the throughput of threads continually accessing random elements of an array, using different constructs as the locking mechanism. Along the X axis is the number of threads; Y axis is throughput. The blue line is a ReentrantLock; the yellow, green and brown lines use variants of a spinlock. Notice how with low numbers of threads, the spinlock gives heigher throughput as you might expect, but as the number of threads ramps up, the back-off logic of ReentrantLock kicks in, and it ends up doing better, while with high contention, the spinlocks just sit burning CPU.
By the way, this was really a trial run done on a dual-processor machine; I also ran it in the Amazon cloud (effectively an 8-way Xeon) but I've ahem... mislaid the file, but I'll either find it or run the experiment again soon and train and post an update. But you get an essentially similar pattern as I recall.
Update: whether it's in locking code or not, a phenomenon that can happen on some multiprocessor architectures is that as the multiple processors do a high volume of memory accesses, you can end up flooding the memory bus, and in effect the processors slow each other down. (It's a bit like with ethernet-- the more machines you add to the network, the more chance of collisions as they send data.)