views:

63

answers:

2

I'm trying to understand how asynchronous file operations being emulated using threads. I've found next-to-nothing materials to read about the subject.

Is it possible that:

  1. a process uses a thread to open a regular file (HDD).
  2. the parent gets the file descriptor from the thread, now it may close the thread.
  3. the parent uses the file descriptor with a new thread, reading X bytes from the file.
  4. the parent gets the file descriptor with the seek-position of the current file state.
  5. the parent may repeat these operations, without the need to open, or seek, every time it wishes to "continue" reading a new chunk of the file?

This is just a wild guess of mine, would appreciate if anybody mind to shed more light to clarify how it's being emulated efficiently.

UPDATE: By efficient I actually mean that I don't want the thread to "wait" since the moment the file been opened. Think of a HTTP non-blocking daemon which serves a client with a huge file, you want to use the thread to read chunks of the file without blocking the daemon - but you don't want to keep the thread busy while "waiting" for the actual transfer to take place, you want to use the thread for other blocking operations of other clients.

A: 

When you open/create a file fire up a thread. Now store that thread id/ptr as your file handle.

Basically the thread will do nothing except sit in a loop waiting for an "event". A semaphore would be good here. When you want to do a read then you add the read command to a queue (remember to critical section the stack add), return a unique id, and then you increment the semaphore. If the thread is asleep it will now wake up and grab the first message off the queue and process it. When it has completed you remove the command from the queue.

To poll if a file read has completed you can, simply, check to see if its in the command queue. If its not there then the command has completed.

Furthermore if you want to allow synchronous reads as well then you can wait after sending the message through for an "event" to get triggered by the completion. You then check to see if the unique id is the queue and if it isn't you return control. If it still is then you go back to a wait state until the relevant unique id has been processed.

Goz
If you suggest that one should keep the thread (which opened the file) alive, for the whole duration - so what's efficient about that? I thought the idea was to be able to "stop", "shutdown" the thread while you may execute further file operations via the file descriptor the parent fetched. It's not possible?
Doori Bar
But its not "alive" the whole time. It spends most of its time in a "wait" state waiting for commands. It is not using CPU resources when in this state. When you wake it up then it will use some CPU as it processes the command and begins the synchronous read.
Goz
I'm sorry I was too vague, by efficient I was actually referring to the idea of not wasting the thread waiting on this file operation. For example, a HTTP non-blocking daemon which serves a large file to a client, the transfer of the file might end within a hour, two hours, but the file reading operation could end within a minute if it supported Async I/O. Now I make more sense of what I mean by "efficient"? to keep the 'waiting' threads free for other operations to use.
Doori Bar
@Martin York: If I have 1000 clients, which download large files, it means I have 1000 threads which are "waiting" - from my POV that's the bad idea I would like to avoid. My goal is not to spawn threads based on need, but to use existing threads for multiple operations. If each file open will keep a thread dedicated for the whole transfer - then how exactly this "waiting" thread helps me?
Doori Bar
Well if you use 1 thread then you can have only 1 i/o op in process at any time. It would be entirely possible to send an "open" command to the thread. That way you could then also hold an extra thread that wakes up when there is a new i/o op in the command queue and then dispatches it to a thread in a thread pool. If there is no thread currently available then it will have to wait until there is. This way you could minimise the number of threads and scale better. You'd REALLY be better, as pointed out, using the underlieing async i/o architecture available though ... boost can help here.
Goz
+1  A: 

To understand asynchronous I/O better, it may be helpful to think in terms of overlapping operation. That is, the number of pending operations (operations that have been started but not yet completed) can simutaneously go above one.

A diagram that explains asynchronous I/O might look like this: http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx

If you are using the asynchronous I/O capabilities provided by the underlying Operating System, then it is possible to asynchronously read from multiple files without spawning a equal number of threads.

If your underlying Operating System does not provide asynchronous I/O, or if you decide not to use it, in other words, you wish to emulate asynchronous operation by only using blocking I/O (the regular Read/Write provided by the Operating System) then it is necessary to spawn as many threads as the number of simutaneous I/O operations. This is because when a thread is making a function call to blocking I/O, the thread cannot continue its execution until the operation finishes. In order to start another blocking I/O operation, that operation has to be issued from another thread that is not already occupied.

rwong
I'm afraid I'll have to clarify one more time :) ... a client requests a large file. the daemon open the file within a thread, read a chunk of the file to a buffer. this operation is done? then the daemon marks the thread is being "free" for other operations. Then, the client is ready to receive the next chunk... and now the daemon reads the following chunk... free the thread for the others to use... now it makes more sense?
Doori Bar
@Doori: thanks for your clarification. In this case, it is not necessary to use "Asynchronous I/O" (remember this term has a specific meaning in Operating Systems). Your case can be handled by simply closing the file after reading a chunk. When the client requests the next chunk, you can open the file again to read the next chunk, and close it again. This is my suggestion because an Operating System usually has a global limit on the number of file handles (asynchronous or not) that can be opened simutaneously.
rwong
Thanks for the heads up regards the file handlers limitation, I guess I'll google further about that one. But I need to emulate Async I/O never the less, because even if I'll open(), read(), and close() within the daemon's scope - it would block. (even if for a short duration), multiple "short" blocking operations would make my daemon's latency between operations a very unsubtle thing
Doori Bar