views:

99

answers:

2

Possible Duplicate:
Finding and Printing all Links within a DIV

I'm trying to make a mini crawler..when i specify a site.. it does file_get_contents()..then get the data i want.. which i've already done.. now i want to add code that enables it to find..any external links on the site it is on.. and get the data ..

basically..instead of me specifying a site..it just follows external links and get the data if available...

here is what i have..

thanks in advance..

     <?php

        $link = strip_tags($_GET['s']);

        $path_info = parse_url($link); 
        $name= $path_info['host'];
        $name= str_replace('www.','', $name);


        $original_file = @file_get_contents($link);

          if($original_file  === false) { 
    die("$link does not exist");  
    }
        $data= preg_match("stuff", $original_file, $m); 
echo $data;
+1  A: 

use HTML DOM PARSER

// Create DOM from URL
$html = file_get_html('http://www.example.com/');

// Find all links 
$allURLs = array();
foreach($html->find('a') as $element) 
       $allURLs[] = $element->href;

Now $allURLs contains all URLs of the webpage and you can file_get_contents() for each link using loop.

NAVEED
A: 

if i were u i would have broken this code in two parts


First Part :---

  will fetch the content and display the link 

Second Part :---

        Second part will be called when I specify which link i want to display
        i will specify this external link back to same file recursively.

so basically ur code will look like this


     first part --> 1)get the data
                    2)parse the link 
                   if( link is chosen )
                    {
                       run current file again with selected link passed
                     }
Extjs Commander