views:

451

answers:

3

Hello all,

I'm looking to create a PHP script where, a user will provide a link to a webpage, and it will get the contents of that webpage and based on it's contents, parse the contents.

For example, if a user provides a YouTube link:

http://www.youtube.com/watch?v=xxxxxxxxxxx

Then, it will grab the basic information about that video (thumbnail, embed code?)

Or they might provide a vimeo link:

 http://www.vimeo.com/xxxxxx

Or even if they were to provide any link, without a video attached, such as:

 http://www.google.com/

And it could grab just the page Title or some meta content.

I'm thinking I'd have to use file_get_contents, but I'm not exactly sure how to use it in this context.

I'm not looking for someone to write the entire code, but perhaps provide me with some tools so that I can accomplish this.

Many thanks!

+3  A: 

You can use either the curl or the http library. You send a http request, and can use the library to get the information from the http response.

txwikinger
in addition, you can use regex to parse the information you want ftom those websites.
yoda
A: 

Maybe Thumbshots or Snap already have some of the functionality you want?

I know that's not exactly what you are looking for, but at least for the embedded stuff that might be handy. Also txwikinger already answered your other question. But maybe that helps ypu anyway.

André Hoffmann
+1  A: 

file_get_contents() would work in this case assuming that you have allow_fopen_url set to true in your php.ini. What you would do is something like:

$pageContent = @file_get_contents($url);
if ($pageContent) {
    preg_match_all('#<embed.*</embed>#', $pageContent, $matches);
    $embedStrings = $matches[0];
}

That said, file_get_contents() won't give you much in the way of error handling other receiving the content on success or false on failure. If you would like to have more rich control over the request and access the HTTP response codes, use the curl functions and in particular, curl_get_info, to look at the response codes, mime types, encoding, etc. Once you get the content via either curl or file_get_contents() your code for parsing it to look for the HTML of interest will be the same.

Adam Franco
After a call to file_get_contents using the HTTP wrapper (so opening a URL), the variable $http_response_header will be populated with the response-headers
Greg