views:

84

answers:

3

When i am trying to download a file whose name has characters from languages like chinese japanese etc...... non ascii... the downloaded file name is garbled. How to rectify it.

I have tried to put charset=UTF-8 in the Content-type header property, but no success. Please help. Code below.

header("Cache-Control: ");// leave blank to avoid IE errors

header("Pragma: ");// leave blank to avoid IE errors

header("Content-type: application/octet-stream");

header("Content-Disposition: attachment; filename=\"".$instance_name."\"");

header("Content-length:".(string)(filesize($fileString)));

sleep(1);

fpassthru($fdl);

A: 

I think if you try to add other charset will fix you proplem.

if the proplem still I think you need to install languages files from XP CD to your system because if the system can't find the right chars, it will add odd ones.

I had a proplem like this with arabic language but I find that I didn't copy all language files to my system.

Hope this will help you.

SzamDev
+4  A: 

Unfortunately there is currently not a single solution that works with all browsers. There are at least three "more obvious" approaches to the problem.

a) Content-type: application/octet-stream; charset=utf-8 + filename=<utf8 byte sequence>
e.g. filename=Москва.txt
This is a violation of standards but firefox shows the name correctly. IE doesn't.

b) Content-type: application/octet-stream; charset=utf-8 + filename=<urlencode(utf8 byte sequence)>
e.g. filename=%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D0%B0.txt
This works with IE but not with firefox.

c) providing the name as specified in rfc 2231
e.g filename*=UTF-8''%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D0%B0.txt
Again firefox supports this, IE doesn't.

for a more comprehensive comparison see http://greenbytes.de/tech/tc2231/


edit: When I said that there is no single solution, I meant via header('...'). But there is something of a work around.
When there is no usable filename=xyz header browsers use the basename of the path part of the url. I.e. for <a href="test.php/lala.txt"> both firefox and IE suggest lalala.txt as the filename.
You can append extra path components after the actual path to your php script (when using apache's httpd see http://httpd.apache.org/docs/2.1/mod/core.html#acceptpathinfo).
E.g. if you have a file test.php in your document root and request it as http://localhost/test.php/x/y/z the variable $_SERVER['PATH_INFO'] will contain /x/y/z.
Now, if you put a link like

<a
  href="/test.php/download/moskwa/&#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;"
>
  &#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;
</a>

in your document you can fetch the download/moskwa/... part and initiate the download of the file. Without sending any filename=... information both firefox and IE suggest the "right" name.
You can even combine it with sending the name according to rfc 2231. That's why I also put moskwa into the link. That would be the id the script uses to find the file it is supposed to send. The IE ignores the filename*=... information and still uses the basename part of the url to suggest a name. That means for firefox (and any other client that supports rfc 2231) the part after the id is meaningless* but for the IE (and other clients not supporting rfc 2231) it would be used for the name suggestion.
self-contained example:

<?php // test.php
$files = array(
  'moskwa'=>array(
    'htmlentities'=>'&#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;',
    'content'=>'55° 45′ N, 37° 37′ O'
  ),
  'athen'=>array(
    'htmlentities'=>'&#x391;&#x3b8;&#x3ae;&#x3bd;&#x3b1;',
    'content'=>'37° 59′ N, 23° 44′ O'
  )
);


$fileid = null;
if ( isset($_SERVER['PATH_INFO']) && preg_match('!^/download/([^/]+)!', $_SERVER['PATH_INFO'], $m) ) {
  $fileid = $m[1];
}

if ( is_null($fileid) ) {
  foreach($files as $fileid=>$bar) {
    printf(
      '<a href="./test.php/download/%s/%s.txt">%s</a><br />', 
      $fileid, $bar['htmlentities'], $bar['htmlentities']
    );
  }  
}
else if ( !isset($files[$fileid]) ) {
  echo 'no such file';
}
else {
  $f = $files[$fileid];
  $utf8name = mb_convert_encoding($f['htmlentities'], 'utf-8', 'HTML-ENTITIES');
  $utf8name = urlencode($utf8name);

  header("Content-type: text/plain");
  header("Content-Disposition: attachment; filename*=UTF-8''$utf8name.txt");
  header("Content-length: " . strlen($f['content']));
  echo $f['content'];
}

*) That's a bit like here on Stack Overflow. The link for this question is shown as

http://stackoverflow.com/questions/2578349/while-downloading-filenames-from-non-english-languages-are-not-getting-displayed

but it also works with

http://stackoverflow.com/questions/2578349/mary-had-a-little-lamb

the important part is the id 2578349

VolkerK
+1 the best answer I've ever seen on the issue. Thanks for the link. This goes into my favourites box.
Pekka
Thank you for providing the detailed explanation really helped in understanding the issue deeper. Thanks
pks83
Hi VolkerK,Any inputs on how to deal with Safari ?
pks83
hm, not really. What does safari do with the work-around code?
VolkerK
A: 

Hi. I am using a servlet to send the requested file to be downloaded.I finally managed to get it working on Firefox and Chrome with the help of your post, but its still not working with IE. m using the filename=(utf8bytesequence) in content-disposition header. Same thing works with Chrome. Please let me know how to sort this out

Thanks in advance