ansaurus

Question

How can I enforce a maximum amount of forked children?

Answer 1

A:

man 2 setrlimit

That's going to be per-user which may be what you want anyway.

Dustin 2008-12-03 05:36:53

How about if I want to do it at the application's level?

2008-12-03 05:44:06

That's a syscall you can call from your application. It's very commonly used to limit core, core dumps, file descriptors, etc... You can also use it to limit CPU utilization (I do that a lot), and number of processes.

Dustin 2008-12-03 06:31:27

Answer 2

+2 A:

The best thing I can come up with is to add all the tasks to a queue, launch the maximum number of threads you want, and then have each thread requesting a task from the queue, execute the task and requesting the next one. Don't forget to have the threads terminate when there are no more tasks to do.

tomjen 2008-12-03 05:51:40

Answer 3

+2 A:

Forking is an expensive operation. From the looks of it, what you really want is multi**threading**, not multi**processing**. The difference is that threads are much lighter weight than processes, since threads share a virtual address space but processes have separate virtual address spaces.

I'm not a PHP developer, but a quick Google search reveals that PHP does not support multithreading natively, but there are libraries to do the job.

Anyways, once you figure out how to spawn threads, you should figure out how many threads to spawn. In order to do this, you need to know what the bottleneck of your application is. Is the bottleneck CPU, memory, or I/O? You've indicated in your comments that you are network-bound, and network is a type of I/O.

If you were CPU bound, you're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches. Assuming you can figure out how many total threads to spawn, you should divide your work into that many units, and have each thread process one unit independently.

If you were memory bound, then multithreading would not help.

Since you're I/O bound, figuring out how many threads to spawn is a little trickier. If all work items take approximately the same time to process with very low variance, you can estimate how many threads to spawn by measuring how long one work item takes. However, since network packets tend to have highly variable latencies, this is unlikely to be the case.

One option is to use thread pools - you create a whole bunch of threads, and then for each item to process, you see if there is a free thread in the pool. If there is, you have that thread perform the work, and you move onto the next item. Otherwise, you wait for a thread to become available. Choosing the size of the thread pool is important - too big, and you're wasting time doing unnecessary context switches. Too few, and you're waiting for threads too often.

Yet another option is to abandon multithreading/multiprocessing and just do asynchronous I/O instead. Since you mentioned you're working on a single-core processor, this will probably be the fastest option. You can use functions like socket_select() to test if a socket has data available. If it does, you can read the data, otherwise you move onto a different socket. This requires doing a lot more bookkeeping, but you avoid waiting for data to come in on one socket when data is available on a different socket.

If you want to eschew threads and asynchronous I/O and stick with multiprocessing, it can still be worthwhile if the per-item processing is expensive enough. You might then do the work division like so:

$my_process_index = 0;
$pids = array();

// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
  $pid = pcntl_fork();
  if($pid == -1)
  {
    die("couldn't fork()");
  }
  elseif($pid > 0)
  {
    // parent
    $my_process_index++;
    $pids[] = $pid
  }
  else
  {
    // child
    break;
  }
}

// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
  do_stuff_with($items[$i]);
}

if($my_process_index != 0)
{
  exit(0);
}

Adam Rosenfield 2008-12-03 05:57:54

Thanks for the answer, I'll give this a try. I've read in places that in Unix fork() isn't actually that expensive, and that's how threading is implemented anyway? Is this outdated/incorrect information?

2008-12-03 06:09:50

"You're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches." Well, in my case, I only have 1 core but I can dramatically speed things up by running 20 tasks at once. The tasks are dependent on (slow) network operations.

2008-12-03 06:12:43

No, that's not how threading is implemented. It's hard to explain in 300 characters - read up on how fork() is implemented, virtual memory, virtual address spaces, copy-on-write, and many more related topics.

Adam Rosenfield 2008-12-03 06:16:05

fork() is not expensive, exec() is. Kernel don't copy code segment - both parent and child share it. Data segment will be duplicated only when either parent or child change something in data segment (copy-on-write).NB! I assume we are talking Unix here.

qrdl 2008-12-03 08:00:13

Answer 4

+1 A:

There is no syscall to get a list of child pids, but ps can do it for you.

--ppid switch will list all children for you process so you just need to count number of lines outputted by ps.

Alternatively you can maintain your own counter that you will increment on fork() and decrement on SIGCHLD signal, assuming ppid stays unchanged for fork'ed processed.

qrdl 2008-12-03 08:09:00

I chose to go with the SIGCHLD and wait() to track finished children. Thanks for the answer.

2008-12-05 00:35:47

ansaurus

tags:

views:

answers:

How can I enforce a maximum amount of forked children?

related questions