views:

30

answers:

2

I've always believed that the HTTP Content-Type should correctly identify the contents of a returned resources. I've recently noticed a resource from google.com with a filename similar to /extern_chrome/799678fbd1a8a52d.js that contained HTTP headers of:

HTTP/1.1 200 OK
Expires: Mon, 05 Sep 2011 00:00:00 GMT
Last-Modified: Mon, 07 Sep 2009 00:00:00 GMT
Content-Type: text/html; charset=UTF-8
Date: Tue, 07 Sep 2010 04:30:09 GMT
Server: gws
Cache-Control: private, x-gzip-ok=""
X-XSS-Protection: 1; mode=block
Content-Length: 19933

The content is not HTML, but is pure JavaScript. When I load the resource using a local proxy (Burp Suite), the proxy states that the MIME type is "script".

Is there an accepted method for determining what is returned from a web server? The Content-type header seems to usually be correct. Extensions are also an indicator, but not always accurate. Is the only accurate method to examine the contents of the file? Is this what web browsers do to determine how to handle the content?

+1  A: 

Is the only accurate method to examine the contents of the file?

Its the method browsers use to determine the file type, but is by no means accurate. The fact that it isn't accurate is a security concern.

The only method available to the server to indicate the file type is via the Content-Type HTTP header. Unfortunately, in the past, not many servers set the correct value for this header. So browsers decided to play smart and tried to figure out the file type using their own proprietary algorithms.

The "guess work" done by browsers is called content-sniffing. The best resource to understand content-sniffing is the browser security handbook. Another great resource is this paper, whose suggestions have now been incorporated into Google Chrome and IE8.

How do I determine the correct file type?

If you are just dealing with a known/small list of servers, simply ask them to set the right content-type header and use it. But if you are dealing with websites in the wild that you have no control of, you will likely have to develop some kind of content-sniffing algorithm.

sri
+1  A: 

The browser knows it's JavaScript because it reached it via a <script src="..."> tag.

If you typed the URL to a .js file into your URL's address bar, then even if the server did return the correct Content-Type, your browser wouldn't treat the file as JavaScript to be executed. (Instead, you would probably either see the .js source code in your browser window, or be prompted to save it as a file, depending on your browser.)

Browsers don't do anything with JavaScript unless it's referenced by a <script> tag, plain and simple. No content-sniffing is required.

Joe White