tags:

views:

1345

answers:

6

There doesn't seem to be an accepted way of sending down a header parameter in non ascii format.

The header for file download usually looks like

Content-disposition: attachment; filename="theasciifilename.doc"

Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up.

There is a document on CodeProject that explains a method for encoding the filename.

This document encodes Bản Kiểm Kê.doc to B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc by hex encoding the bytes.

Problem #1: the first character in that string: ả has a value of ả -- encode that number in Hex and you get %a3%1e. How did this guy get %e1%ba%a3? (I'm obviously missing something simple here)

Problem #2: While IE acknowledges this encoding, Firefox doesn't! What to do?

+1  A: 

In the link you've got above, e1 ba a3 is the UTF-8 encoding of the character mentioned, not the character code.

Douglas Mayle
+1  A: 

Answer to question #1: You are confusing Unicode and UTF-8. The hex value of 'ả' is 0xA31E however that is not a UTF-8 character. In UTF-8 that character requries three bytes, 0xE1 0xBA 0xA3. URL encoding is poorly defined for non-ascii encodings but %e1%ba%a3 is the valid UTF-8 encoding to use for that character.

Mr. Shiny and New
A: 

Answer (sort of) to problem #2:

Since you've discovered that the naming scheme in one browser does not work in the other, your only solution is to do it differently for each browser, similar to the example here.

In case the link goes away, the solution is basically:

1. If browser is IE URL encode filename
2. Generate Content-disposition header

Of course determining if the browser is IE by User-agent (which is about the only way you can do it) is fraught with all sorts of the usual peril.

As North American centric as this sounds, if it is important that this work in a large number of browsers you do not control which may have the User-agent blocked, or modified, then simply avoid UTF-8 encoded characters in the filename and always use "Download" or something.

Grant Wagner
+2  A: 

The specs basically don't permit anything other than US-ASCII. HTTP headers are US-ASCII. HTTP's payload defaults to ISO 8859-1 but that refers to the content body, not the headers.

Arguably the Right Thing to do would be to use MIME's technique for encoding non-ASCII data in headers, as described in RFC 2047, but I have no idea whether browsers actually support that.

EDIT: Whoops, no, RFC 2047 section 5 explicitly says that the encoded form is not permitted in Content-Disposition. Looks like you're out of luck - there is no standard.

EDIT 2: There is a standard - RFC 2231 defines how this is now supposed to work. It has support from some browsers, but is not supported in IE. I found some test cases which demonstrate how it works and what browser support is available.

Mike Dimmick
+1  A: 

For Problem #2 you need to URL encode the file name for both Internet Explorer and Firefox. The only difference is that you need to use the format of RFC 2231 in Firefox. This applies to Firefox 3 and Internet Explorer 7.

A: 

Unfortunately, there currently is no single way that would work in all User Agents.

See http://greenbytes.de/tech/tc2231/ for test cases, then complain to Microsoft, Google and Apple.

Julian Reschke