With the O_NONBLOCK write(), the write() call will accept (that is, copy to an in-kernel buffer) all, some, or none of the data you passed to it (if some bytes were accepted, write()'s return value will indicate how many bytes it accepted... if none were accepted, write() will return -1 and errno will be set to EWOULDBLOCK). The number of bytes that write() accepts will depend on how much space it has available in its in-kernel buffer at the moment. After write() returns, it's your responsibility to remember how many of your bytes it accepted, and then call select() (or poll() or some other mechanism) so that you will be notified when there is more space available in the buffer. When more space becomes available (i.e. at some time in the future) you can then call write() again to pass more bytes to the buffer.
aio_write(), on the other hand, will "take ownership" of the data you pass in to the function, and notify you later on when it has finished writing out the data. With aio_write(), you don't have to worry about the call accepting only part of your data buffer; it will either accept the entire thing, or error out. That will make your app's I/O logic a bit simpler in that respect; however I think asynchronous i/o has its own set of complicating factors so it might not always be a win. (I haven't used aio_*() myself, so I can't give details)
As for why the write() function doesn't seem to be taking more time as the length of data written increases... that's because a non-blocking write() only copies (none, or some, or all) of the data you pass to it into a buffer, and then returns immediately; it doesn't actually wait for the data to get onto the disk. Copying a (relatively small) sequence of bytes from your app's buffer to an in-kernel buffer will always be fast, and the number of bytes copied will never be greater than the amount of empty space currently available in the in-kernel buffer, so even the number of bytes copied per write() is bounded/small.