views:

162

answers:

4

I'm work on a build tool that launches thousands of processes (compiles, links etc). It also distributes executables to remote machines so that the build can be run accross 100s of slave machines. I'm implementing DLL injection to monitor the child processes of my build process so that I can see that they opened/closed the resources I expected them to. That way I can tell if my users aren't specifying dependency information correctly.

My question is:

I've got the DLL injection working but I'm not all that familiar with windows programming. What would be the best/fastest way to callback to the parent build process with all the millions of file io reports that the children will be generating? I've thought about having them write to a non-blocking socket, but have been wondering if maybe pipes/shared memory or maybe COM would be better?

+1  A: 

If you stay in the windows world (None of your machines is linux or whatever) named pipes is a good choice, because it is fast and can be accessed across the machine boundary. I think shared memory is out of the race, because it can't cross the machine boundary. Distributed com allows to formulate the contract in IDL, but i think XML Messages via pipes are also ok. The xml messages have the benefit to work completely independent from the channel. If yo need linux later you can switch to tcp/ip transport and send your xml messages.

Some additional techniques with limitations:

Another forgotten but hot candidate is RPC (remote procedure calls). Lot of windows services rely on this. But i think it is hard to program RPC

If you are on the same machine and you only need to send some status information, you can regisier a windows message via RegisterWindowMessage() and send messages vie SendMessage()

Thomas Maierhofer
Programming RPC isn't terribly difficult -- it's just *horribly* under-documented. There have only been a few third-party books on it, and they're all out of print. MSDN has only a few articles that barely mention it in passing.
Jerry Coffin
Hmm, although my system is distributed this particular bit of it isn't. The slave running on each machine will be responsible for collating all the file events will just send back over the network the list of files read/written by the "job". Windows messaging could be a winner. I don't fancy XML since all that parsing of so many messages could get messy. To put it in perspective, one link of a large project can generate 500k input/output messages.
Benj
A: 

apart from all the suggestions from thomas, you might also just use a common database to store the results. And if that is too slow use one of the more modern(and fast) key/value databases (like tokyo cabinet/memcachedb/etc).

Toad
Hmm, I only want to hold on to these messages very briefly, I think perhaps a db solution would be a bit overkill.
Benj
+1  A: 

First, since you're apparently dealing with communication between machines, not just within one machine, I'd rule out shared memory immediately.

I'd think hard about trying to minimize the amount of data instead of worrying a lot about how fast you can send it. Instead of sending millions of file I/O reports, I'd batch together a few kilobytes of that data (or something on that order) and send a hash of that packet. With a careful choice of packet size, you should be able to reduce your data transmission to the point that you can simply use whatever method you find most convenient, rather than trying to pick the one that's the fastest.

Jerry Coffin
I suspect you're right here, from each child process I really just want to pass back a list of all the files read/written not the actual events them selves. Maybe that list sent via a windows message would be the solution.
Benj
Maybe -- then again, sorting and hashing the list (and sending only the hash) would reduce the data a lot more...
Jerry Coffin
A: 

This sounds like a lot of overkill for the task of verifying the files used in a build. How about, just scanning the build files? or capturing the output from the build tools?

Chris Becke
Hmm, not really, obviously I scan the build files already since I'm the build tool. My tool does provide the capability to scan the output from compilers etc but not all tools provide such handy output and I want a general solution.
Benj