Because of the way TCP congestion control works, it's more efficient to send data all at once. TCP maintains a window of how much data it will allow to be "in the air" (sent but not yet acknowledged). TCP measures the acknowledgments coming back to figure out how much data it can have "in the air" without causing congestion (i.e., packet loss). If there isn't enough data coming from the application to fill the window, TCP can't make accurate measurements so it will conservatively shrink the window.
If you only have a few, small headers and your calls to send
are in rapid succession, the operating system will typically buffer the data for you and send it all in one packet. In that case, TCP congestion control isn't really an issue. However, each call to send
involves a context switch from user mode to kernel mode, which incurs CPU overhead. In other words, you're still better off buffering in your application.
There is (at least) one case where you're better off without buffering: when your buffer is slower than the context switching overhead. If you write a complicated buffer in Python, that might very well be the case. A buffer written in CPython is going to be quite a bit slower than the finely optimized buffer in the kernel. It's quite possible that buffering would cost you more than it buys you.
When in doubt, measure.
One word of caution though: premature optimization is the root of all evil. The difference in efficiency here is pretty small. If you haven't already established that this is a bottleneck for your application, go with whatever makes your life easier. You can always change it later.