views:

905

answers:

4

I have implemented a simple file upload-download mechanism. When a user clicks a file name, the file is downloaded with these HTTP headers:

HTTP/1.1 200 OK
Date: Tue, 30 Sep 2008 14:00:39 GMT
Server: Microsoft-IIS/6.0
Content-Disposition: attachment; filename=filename.doc;
Content-Type: application/octet-stream
Content-Length: 10754

I also support Japanese file names. In order to do that, I encode the file name with this java method:

private String encodeFileName(String name) throws Exception{
    String agent = request.getHeader("USER-AGENT");
    if(agent != null && agent.indexOf("MSIE") != -1){ // is IE
     StringBuffer res = new StringBuffer();
     char[] chArr = name.toCharArray();
     for(int j = 0; j < chArr.length; j++){
      if(chArr[j] < 128){ // plain ASCII char
       if (chArr[j] == '.' && j != name.lastIndexOf("."))
        res.append("%2E");
       else
        res.append(chArr[j]);
      }
      else{ // non-ASCII char
       byte[] byteArr = name.substring(j, j + 1).getBytes("UTF8");
       for(int i = 0; i < byteArr.length; i++){
        // byte must be converted to unsigned int
        res.append("%").append(Integer.toHexString((byteArr[i]) & 0xFF));
       }
      }
     }
     return res.toString();
    }
    // Firefox/Mozilla
    return MimeUtility.encodeText(name, "UTF8", "B");
}

It worked well so far, until someone found out that it doesn't work well with long file names. For example: あああああああああああああああ2008.10.1あ.doc. If I change one of the single-byte dots to a single-byte underline , or if I remove the first character, it works OK. i.e., it depends on length and URL-encoding of a dot character. Following are a few examples.

This is broken (あああああああああああああああ2008.10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;

This is OK (あああああああああああああああ2008_10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008_10%2E1%e3%81%82.doc;

This is also fine (あああああああああああああああ2008.10.1あ.doc):

Content-Disposition: attachment; filename=%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%82%e3%81%822008%2E10%2E1%e3%81%82.doc;

Anybody have a clue?

+3  A: 

gmail handles file name escaping somewhat differently: the file name is quoted (double-quotes), and single-byte periods are not URL-escaped. This way, the long file name in the question is OK.

Content-Disposition: attachment; filename="%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%82%E3%81%822008.10.1%E3%81%82.doc"

However, there is still a limitation (apparently IE-only) on the byte-length of the file name (a bug, I assume). So even if the file name is made of only single-byte characters, the beginning of the file name is truncated. The limitation is around 160 bytes.

Ovesh
Congrats! Sometimes the best answer one can receive is no answer at all, this forces us to look again at the problem - and it's far more rewarding when you yourself solve it ;)
Joe Pineda
+1  A: 

The main issue here is that IE does not support the relevant RFC, here: RFC2231. See pointers and test cases. Furhtermore, the workaround that you use for IE (just using percent-escaped UTF-8) has several additional problems; it may not work in all locales (as far as I recall, the method fails in Korea unless IE is configured to always use UTF-8 in URLs which is not the default), and, as previously mentioned, there are length limits (I hear that that is fixed in IE8, but I did not try yet).

Julian Reschke
+1  A: 

As mentioned above, Content-Disposition and Unicode is impossible to get working all main browsers without browser sniffing and returning different headers for each.

My solution was to avoid the Content-Disposition header entirely, and append the filename to the end of the URL to trick the browser into thinking it was getting a file directly. e.g.

http://www.xyz.com/cgi-bin/dynamic.php/あああああああああああああああ2008.10.1あ.doc

This naturally assumes that you know the filename when you create the link, although a quick redirect header could set it on demand.

Gavin Brock
Thanks, that looks good. I'll give it a try next time.
Ovesh
A: 

ya this issue is fixed in IE 8. I have seen this. working fine in IE 8.

hardik