Given that kernel developers like Christoph Lameter (and Ingo Molnar on the scheduler) have tuned the kernel to work well on 4096 processors, and given the amount of optimizations invested by Intel itself in the issue, with multicore specific tuning both for performance and energy saving, I bet the kernel is by far more optimized than anything any of us can write in userspace.
Same about the threading library; there is currently only one thread library, NPTL for Linux 2.6. LinuxThreads was removed from glibc in the 2.4 release, and NPTL was produced before the 2.6 release. And it's really fast.
Just make sure to avoid using an old kernel, the last release of your distro, or kernel.org, is the best. Before deploying in production, make sure to measure the performance difference, and consider whether that is worth the additional support costs (if any).