A few years ago, we studied this particular question for a client/server situation where both client and server were running on the same machine. At the time, we were using sockets (UDP) even when client and server were on the same machine. For us, "best" turned out to be shared memory with named semaphores to synchronize it. At the time, I mainly studied pipes versus a raw shared memory implementation. I tested pipes with overlapped I/O and with I/O completion ports.
I tested with a large variety of data sizes. At the low end where client and server were echoing 1 byte back and forth, the raw shared memory implementation was the fastest by a factor of 3. When I passed 10,000 bytes back and forth, the pipe implementations and the raw shared memory implementation were all about the same speed. I was using 4K buffers if I recall correctly with the shared memory implementation.
For all data sizes, the shared memory test ranged between 2 times and 6 times faster than using sockets (compared against TCP).
Between the pipe implementations, the overlapped I/O version was faster than the I/O completion port version by about 30% when passing small amounts of data. Again, with larger chunks of data, the difference was minimal.
The pipe implementation was certainly much less complex to code. But we dealt with quite a few small chunks of data being passed back and forth, so it was worth the extra complexity to implement the shared memory version with named semaphores.
Of course, this was several years ago as mentioned, and you have no idea if I implemented all the different tests correctly. Note too that this was with a single client. The final implementation of our shared memory communication does scale very well for hundreds of "clients" running. But I do not know if it is better at that scale than a pipe implementation.