After 8 months of working on this problem, 3 of them with Microsoft, here is the solution. The short answer is that the server side (the side sending the large file) needed to use the following for a binding:
<customBinding>
<binding name="custom_tcp">
<binaryMessageEncoding />
<tcpTransport connectionBufferSize="256192" maxOutputDelay="00:00:30" transferMode="Streamed">
</tcpTransport>
</binding>
</customBinding>
The key here being the connectionBufferSize attribute. Several other attributes may need to be set (maxReceivedMessageSize, etc.), but connectionBufferSize was the culprit.
No code had to be changed on the server side.
No code had to be changed on the client side.
No configuration had to be changed on the client side.
Here is the long answer:
I suspected all along that the reason net.tcp over WCF was slow was because it was sending small chunks of information very frequently rather than larger chunks of information less often, and that this made it perform poorly over high latency networks (the internet). This turned out to be true, but it was a long road to get there.
There are several attributes on the netTcpBinding that sound promising: maxBufferSize being the most obvious, and maxBytesPerRead and others sounding hopeful as well. In addition to those, it is possible to create more complicated streams than the one in the original question - you can specify the buffer size there as well - on both the client and the server side. The problem is that none of this has any impact. Once you use a netTcpBinding, you are hosed.
The reason for this is that adjusting maxBufferSize on a netTcpBinding adjusts the buffer on the protocol layer. But nothing you can do to a netTcpBinding will ever adjust the underlying transport layer. This is why we failed for so long to make headway.
The custom binding solves the problem because increasing the connectionBufferSize on the transport layer increases the amount of information sent at once, and thus the transfer is much less susceptible to latency.
In solving this problem, I did notice that maxBufferSize and maxBytesPerRead did have a performance impact over low latency networks (and locally). Microsoft tells me that maxBufferSize and connectionBufferSize are independent and that all combinations of their values (equal to one another, maxBufferSize larger than connectionBufferSize, maxBufferSize smaller than connectionBufferSize) are valid. We are having success with a maxBufferSize and maxBytesPerRead of 65536 bytes. Again, though, this had very little impact on high-latency network performance (the original problem).
If you are wondering what maxOutputDelay is for, it is the amount of time that is allotted to fill the connection buffer before the framework throws an IO exception. Because we increased the buffer size, we also increased the amount of time allotted to fill the buffer.
With this solution, our performance increased about 400% and is now slightly better than IIS. There are several other factors that affect the relative and absolute performance over IIS over HTTP and WCF over net.tcp (and WCF over http, for that matter), but this was our experience.