If I understood correctly, you want to have all processes reading lines from a single file. I don't recommend this, it's kinda messy, and you'll have to a) read the same parts of the file several times or b) use locking/mutex or some other mechanism to avoid that. It'll get complicated and hard to debug.
I'd have a master process read the file, and assign lines to a subprocess pool. You can use shared memory to speed this up, and reduce the need for data-copying IPC; or use threads.
As for examples, I answered a question about forking and IPC and gave a code snippet on an example function that forks and returns a pair of pipes for parent-child communication. Let me look that up (...) here it is =P http://stackoverflow.com/questions/3884103/can-popen-make-bidirectional-pipes-like-pipe-fork/3884402#3884402
edit: I kept thinking about this =P. Here's an idea:
Have a master process spawn subprocesses with something similar to what I showed in the link above.
Each process starts by sending a byte up to the master to signal it's ready, and blocking on read()
.
Have the master process read a line from the file to a shared memory buffer, and block on select()
on its children pipes.
When select()
returns, read one of the bytes that signal readiness and send to that subprocess the offset of the line in the shared memory space.
The master process repeats (reads a line, blocks on select, reads a byte to consume the readiness event, etc.)
The children process the line in whatever way you need, then send a byte to the master to signal readiness once again.
(You can avoid the shared memory buffer if you want, and send the lines down the pipes, though it'll involve constant data-copying. If the processing of each line is computationally expensive, it won't really make a difference; but if the lines require little processing, it may slow you down).
I hope this helps!
edit 2 based on Newba's comments:
Okay, so no shared memory. Use the above model, only instead of sending down the pipe the offset of the line read in the shared memory space, send the whole line. This may sound to you like you're wasting time when you could just read it from the file, but trust me, you're not. Pipes are orders of magnitude faster than reads from regular files in a hard disk, and if you wanted subprocesses to read directly from the file, you'll run into the problem I pointed at the start of the answer.
So, master process:
Spawn subprocesses using something like the function I wrote (link above) that creates pipes for bidirectional communication.
Read a line from the file into a buffer (private, local, no shared memory whatsoever).
You now have data ready to be processed. Call select() to block on all the pipes that communicate you with your subprocesses.
Choose any of the pipes that have data available, read one byte from it, and then send the line you have waiting to be processed in the buffer down the corresponding pipe (remember, we had 2 per child process, on to go up, one to go down).
Repeat from step 2, i.e. read another line.
Child processes:
When they start, they have a reading pipe and a writing pipe at their disposal. Send a byte down your writing pipe to signal the master process you are ready and waiting for data to process (this is the single byte we read in step 4 above).
Block on read()
, waiting for the master process (that knows you are ready because of step 1) to send you data to process. Keep reading until you reach a newline (you said you were reading lines, right?). Note I'm following your model, sending a single line to each process at a time, you could send multiple lines if you wanted.
Process the data.
Return to step 1, i.e. send another byte to signal you are ready for more data.
There you go, simple protocol to assign tasks to as many subprocesses as you want. It may be interesting to run a test with 1 child, n children (where n is the number of cores in your computer) and more than n children, and compare performances.
Whew, that was a long answer. I really hope I helped xD