It just occurred to me, (this is a wild guess and probably not likely) but maybe you're having a delayed ack problem due to your send buffer being smaller than the size of the data you're writing. Nagle may have nothing to do with it.
Does the receiving side send any data back immediately? If not, your peer will delay it's ack for up to 200ms waiting to piggy back it's ack on some data to make better use of bandwidth.
When the send buffer on the socket is smaller than the data in this case the call to write will block until the ack has been received and all the data sent.
For example if your send buffer is 8192 bytes and you send 8193 bytes and your peer sends no data back then your write will block for 200ms ( or however long your peers implementation delays the ack) effectively making it look like Nagle is killing you even when it's disabled.
If this is the case you could either increase the send buffer size or have your peer always send you back a null byte to force the ack to be sent immediately.
Otherwise, I would maybe try playing around with NTttcp_x86 a bit to model your applications send / receive patterns and see if maybe something else is going on.