views:

270

answers:

5

How to find the image file type in Perl form website URL?

For example,

$image_name = "logo";
$image_path = "http://stackoverflow.com/content/img/so/".$image_name

From this information how to find the file type that . here the example it should display

"png"

http://stackoverflow.com/content/img/so/logo.png .

Supposer if it has more files like SO web site . it should show all file types

+5  A: 

You can't easily tell. The URL doesn't necessarily reflect the type of the image.

To get the image type you have to make a request via HTTP (GET, or more efficiently, HEAD), and inspect the Content-type header in the HTTP response.

Brian Agnew
its just application , not like web
joe
The above is exactly what's going on with the two questions you've marked as achieving your request :-)
Brian Agnew
+4  A: 

Well, http://stackoverflow.com/content/img/so/logo is a 404. If it were not, then you could use

#!/usr/bin/perl

use strict;
use warnings;

use LWP::Simple;

my ($content_type) = head "http://stackoverflow.com/content/img/so/logo.png";

print "$content_type\n" if defined $content_type;

__END__

As Kent Fredric points out, what the web server tells you about content type need not match the actual content sent by the web server. Keep in mind that File::MMagic can also be fooled.

#!/usr/bin/perl
use strict;
use warnings;

use File::MMagic;
use LWP::UserAgent;

my $mm = File::MMagic->new;

my $ua = LWP::UserAgent->new(
    max_size => 1_000 * 1_024,
);

my $res = $ua->get('http://stackoverflow.com/content/img/so/logo.png');

if ( $res->code eq '200' ) {
    print $mm->checktype_contents( $res->content );
}
else {
    print $res->status_line, "\n";
}
__END__
Sinan Ünür
This way also i can do but i have to find all different content type the web-services has
joe
Using This method i can achieve my request
joe
+5  A: 

If you're using LWP to fetch the image, you can look at the content-type header returned by the HTTP server.

Both WWW::Mechanize and LWP::UserAgent will give you an HTTP::Response object for any GET request. So you can do something like:

use strict;
use warnings;

use WWW::Mechanize;

my $mech = WWW::Mechanize->new;
$mech->get( "http://stackoverflow.com/content/img/so/logo.png" );
my $type = $mech->response->headers->header( 'Content-Type' );
friedo
Using This method i can achieve my request
joe
+2  A: 

You really can't make assumptions about content based on URL, or even content type headers.

They're only guides to what is being sent.

A handy trick to confuse things that use suffix matching to identify file-types is doing this:

  http://example.com/someurl?q=foo#fakeheheh.png

And if you were to arbitrarily permit that image to be added to the page, it might in some cases be a doorway to an attack of some sorts if the browser followed it. ( For example, http://really_awful_bank.example.com/transfer?amt=1000000;from=123;to=123 )

Content-type based forgery is not so detrimental, but you can do nasty things if the person who controls the name works out how you identify things and sends different content types for HEAD requests as it does for GET requests.

It could tell the HEAD request that it's an Image, but then tell the GET request that its a application/javascript and goodness knows where that will lead.

The only way to know for certain what it is is downloading the file and then doing MAGIC based identification, or more (i.e., try to decode the image). Then all you have to worry about is images that are too large, and specially crafted images that could trip vulnerabilities in computers that are not yet patched for that vulnerability.

Granted all of the above is extreme paranoia, but if you know the rare possibilities you can make sure they can't happen :)

Kent Fredric
+1 for pointing out the pitfalls of not actually downloading the file and checking it.
Sinan Ünür
Added code to my answer to deal with some of the issues you raised.
Sinan Ünür
+1  A: 

From what i understand you're not worried about the content type of an image you already know the the name+extension for, you want to find the extension for an image you know the base name of.

In order to do that you'd have to test all the image extensions you wanted individually and store which ones resolved and which ones didn't. For example both http://stackoverflow.com/content/img/so/logo.png and http://stackoverflow.com/content/img/so/logo.gif could exist. They don't in this exact situation but on some arbitrary server you could have multiple images with the same base name but different extensions. Unfortunately there's no way to get a list of available extensions of a file in a remote web directory by supplying its base name without looping through the possibilities.

shit a birck
That was the literal question, but from later comments apparently not what was actually wanted :)
ysth