views:

400

answers:

2

I am trying to build a crawler that gets the movie urls from an imdb list. I am able to get all the links on the page into an array and want to select only those ones with "title" in them.

preg_match_all($pattern, "[125] => href=\"/chart/2000s?mode=popular\" [126] => href=\"/title/tt0111161/\" ", $matches);

where $pattern='/title/'.

I am getting the following error:

Warning: preg_match_all() [function.preg-match-all]: Delimiter must not be alphanumeric or backslash in C:\xampp\htdocs\phpProject1\index.php on line 53

Any idea on how to go about this? Thanks a lot.

+1  A: 

Use a DOM Parser:

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all links containing title as part of their HREF 
$links = $html->find('a[href*=title]');

// loop through links and do stuff
foreach($links as $link) { 
       echo $element->href . '<br>';
}

http://simplehtmldom.sourceforge.net/manual.htm

karim79
+1  A: 

Are you sure $pattern is '/title/' at the time when preg_match_all is called?

The error you are getting comes when the pattern provided to preg_match_all (1st argument) is not properly delimited.

codaddict