With the above example I get a round trip time of ~65μsec. If I make two fifos on the file system this goes down to ~45μsec. The extra time using localhost sockets must be because I'm hitting the network stack.
Yes, and that is to be expected.
FIFOs are rather primitive communication method. Their state is essentially a bool variable. Reads and writes go through the same pre-allocated buffer of fixed size. Thus the OS can and does optimize the operations.
Sockets are more complex. Their have full fledged TCP's state machine. The buffering is dynamical and bidirectional (recv, send are buffered separately). That means when you write something into local socket, you pretty much always have some sort of dynamic memory management involved. Linux tries to avoid that as much as possible: zero-copy/single-copy tricks are implemented all over the place. Yet obviously since the calls have to go through more code they would be slower.
In the end, considering how much more sockets are compared to FIFOs, 20us difference frankly is a statement about how good the Linux' socket performance is.
P.S. 65us rtt = ~35us in one direction. 1s/35us =~ 30K packets per second. For network code without optimizations using single connection that sounds about right.