If the file handle and volume have write caching enabled, the file operation may complete with just a memory copy to cache, to be flushed lazily later. Since there is no actual IO taking place, there's no reason to do async IO in that case.
Internally, each IO operation is represented by an IRP (IO request packet). It is created by the kernel and given to the filesystem to handle the request, where it passes down through layered drivers until the request becomes an actual disk controller command. That driver will make the request, mark the IRP as pending and return control of the thread. If the handle was opened for overlapped IO, the kernel gives control back to your program immediately. Otherwise, the kernel will wait for the IRP to complete before returning.
Not all IO operations make it all the way to the disk, however. The filesystem may determine that the write should be cached, and not written until later. There is even a special path for operations that can be satisfied entirely using the cache, called fast IO. Even if you make an asynchronous request, fast IO is always synchronous because it's just copying data into and out of cache.
Process monitor, in advanced output mode, displays the different modes and will show blank in the status field while an IRP is pending.
There is a limit to how much data is allowed to be outstanding in the write cache. Once it fills up, the write operations will not complete immediately. Try writing a lot of data at once, with may operations.