tags:

views:

220

answers:

2

Hey there, I have a simple script that which is suppose to load 2 separate pages at the same time and grab some text from them, however it loads either the parent process or the child process depending on what finishes first, what am i doing wrong ? I want the 2 processes to work simultaneously, here is the example code:

<?php

$pid = pcntl_fork();

if ($pid == -1) {
    die("could not fork");
}
else if($pid) {

                $url = "http://www.englishpage.com/verbpage/simplepresent.html";
                $readurl = file_get_contents($url);
                $pattern = '#Examples(.*?)Forms#s';
                preg_match($pattern, $readurl, $match);
                echo "Test1:".$match[1];


}
else {
                $url = "http://www.englishpage.com/verbpage/simplepresent.html";
                $readurl = file_get_contents($url);
                $pattern = '#Examples(.*?)Forms#s';
                preg_match($pattern, $readurl, $match);
                echo "Test2:".$match[1];

}

echo "<br>Finished<br>";

?>

any help would be appreciated!

A: 

I am not quite sure that I really understand what you are willing to get, but if you want your "Finished" message to be displayed :

  • only once
  • only when the two processes have done their work

You should :

  • Use pcntl_wait in the parent process, so it waits for its child to die
  • Echo "finished" from the parent process, after it has finished waiting.

For instance, something like this should do :

$pid = pcntl_fork();
if ($pid == -1) {
    die("could not fork");
}
else if($pid) { // Father
    sleep(mt_rand(0, 5));
    echo "Father done\n";

    pcntl_wait($status); // Wait for the children to finish / die

    echo "All Finished\n\n";
}
else { // Child
    sleep(mt_rand(0, 5));
    echo "Child done\n";
}

With this, each process will do its work, and only when both have finished, the parent will display that everything is done :

  • if the parent is done first, it'll wait for the child
  • if the child ends first, the parent will not wait... But still finish after it.


As a sidenote : you are using two separate processes ; once forked, you cannot "easily" share data between them -- so it's not easy to pass data from the child to the father, nor is it the other way arround.

If you need to do that, you can take a look at Shared Memory Functions -- or just use plain files ^^


Hope this helps -- and that I understood the question correctly ^^

Pascal MARTIN
hey there, thank you for the input, i tried running the example code you provided but it only outputs "Child Done", i didn't get "father done" or "all finished". What I'm trying to do is have both parent and child processes running at the same time. The purpose of the project is to run a crawler that will grab 2 pages at a time and process them simultaneously rather than 1 after the other.
Oh :-( Strange... Was working for me... Ah, wait : are you trying to fork from a PHP page that's running under Apache ? This could explain that : I tried my code from command line... And, actually, I've always heard that froking from Apache was not as easy/simple/safe as it from command line...
Pascal MARTIN
A: 

From the Process Control Extension Introduction

Process Control support in PHP implements the Unix style of process creation, program execution, signal handling and process termination. Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

So basically, you shouldn't use any of the pcntl functions when you are running a PHP script through the apache module.

If you just want to fetch the data from those 2 pages simultaneously then you should be able to use stream_select to achieve this. You can find an example at http://www.ibm.com/developerworks/web/library/os-php-multitask/.

BTW Apparently curl supports this too, using curl_multi_select, an example on how to use that can be found at http://www.somacon.com/p537.php.

wimvds