I'm creating a website using the cakePHP framework and I and a newbie to php and web programming. I want to do something similar to Digg's submit button, where you type a url and it fetches an image, title and sometimes a short description of the article on the webpage. I'm assuming this would be done using php but I'm open to any method.
You grab the source of the page in question (cURL library or file_get_contents()
if fopen()
URL wrappers are enabled) and parse it for those details.
Title can be the title
element.
Description can be the meta description.
Image can be the largest image (a lot of different ways to look for it).
You can also look for The Open Graph Protocol...
<meta name="og:site_name" content="Stack Overflow" />
<meta name="og:url" content="http://www.stackoverflow.com/" />
<meta name="og:title" content="Hello" />
<meta name="og:image" content="http://www.gravatar.com/avatar/5a9f58455ea36c880bc46820255fb084?s=32&d=identicon&r=PG" />
I'm not too familiar with cake PHP, but I can give you a general idea of what you'll need to do.
First step would be to use AJAX to submit the URL to your server.
Then, the server will need to grab the html source. In php you can do:
$source = file_get_contents('http://www.example.com/')
There are probably other functions, but that one should work.
Once you have the source, you'll have to parse out the data you want. You can use regex or something else to do this part.
Then, you'll probably want to set the data you need to a php array, use
json_encode($my_array)
and return json. Then, do what you wish with it.
Hope this helps
You'll need to do a few simple things:
You'll need to use the curl functions of PHP to get the source for the webpage. The php.net site provides a great example of this.
From that source, you'll need to find the title of the page, and any images. The easiest way would probably be through a simple regular expression.
Here's a simple script example which does both:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "stackoverflow.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$titles = array();
preg_match_all("/<title>(.*)<\/title>/im", $output, &$titles, PREG_PATTERN_ORDER);
$images = array();
preg_match_all("/<img *src= *['\"](.*)['\"](.*)\/*>/iU", $output, &$images, PREG_PATTERN_ORDER);
$page_title = $titles[1][0];
$images_found = $images[1];
echo "Page title was: {$page_title}\n";
foreach($images_found as $image_src) echo "Image: {$image_src}\n";
?>
The regular expressions I included are imperfect, and won't catch all titles or all images in every case, but they're both good starts.
You'll also need to pick which image you want to use from the array $images. You can do this randomly, or based on the largest image on the page, or the first one you find, etc.