views:

582

answers:

3

I have a directory with a name that contains Japanese characters, and I need to use the zip utils in java.util.zip to write it to a zip file. Writing the zip file succeeds, but when I open the resulting zip file with either Windows' built-in compressed file utility or 7-Zip, the directory with Japanese characters in the name appears as a bunch of garbage characters. I do have the Japanese/East Asian language pack installed on my system -- I can create directories with Japanese names, so that isn't the issue.

Interestingly, if I write a separate script to read the resulting zip file using java.util.zip, the directory name is correct, and I can extract the contents of the zip into appropriately named directories, with Japanese characters. But I can't do this using the commercial zip tools that I've tried, which is undoubtedly what our customers will want to do.

Any ideas about what is causing this problem, and how I can work around it?

I know about this bug, but I still need a workaround for this case.

+1  A: 

If java.util.zip still behaves as this post describes, I'm not sure if it is possible (with the built-in classes). I have seen Chilkat's Java Zip library mentioned before as a way to get this to work, but have never used it.

Kaleb Brasee
just to add... the post said "The ZIP file format is not Unicode aware. Filenames are just 8-bit stringsinternally, with (afaik) no defined charset, or Unicode encoding.That means that the writer and reader of the file have to agree on a mutually comprehensible format."This was never completely true. At the time writer made that post, in 2005, the encoding was IBM437. There was no agreement on chars that required something outside that charset. In 2007, PKZIP added UTF-8 to the spec. As Kaleb says, it's not clear if J2SE ever evolved to support that spec update. I'd guess not.
Cheeso
+2  A: 

TrueZIP claims to do this better:

The J2SE API always uses UTF-8 (eight bit Unicode character set) for entry names and comments instead of CP437 (a.k.a. IBM437, the genuine IBM-PC character set), which is used by the de-facto standard PKZIP from PKWARE. As a result, you cannot read or write ZIP files with international entry file names such as e.g. "täscht.txt" in a ZIP file created by a (southern) German.

[description of other problems omitted]

The TrueZIP Library has been developed to overcome these limitations/disadvantages.

meriton
I don't know about the TrueZIP library, but the comment is a little misleading. The makers of the "de-facto standard" PKZIP software, PKWARE, publsh an actual specification of the format. http://www.pkware.com/support/zip-application-note The spec allows the use of either CP437 or UTF-8. If ZIP applications comply with the spec, they can use either encoding. So, the statement "you cannot read or write ZIP files with i18n-ized entry names" is incorrect, in general. Any app can comply with the spec.(It's also unclear - I'm not sure what being in southern Germany has to do with anything).
Cheeso
Having said that, some apps and zip tools do not comply with the spec fully, in particular the UTF-8 portion of the spec, which was first added in Sept 2007. So, for example, Windows Vista "compressed folders" will not properly read .ZIP files with entry names encoded with UTF-8. Not sure about Mac's built-in zip tools. I'm also not sure if the J2SE api complies with the Sept2007 revision of the zip spec, either. It may be true that despite the support of UTF-8 in the spec, the J2SE api predates the spec, and does something outside the spec for Unicode support.
Cheeso
Very informative. IMHO, this would have warranted its own answer (and upvote).I think the "southern german" part is merely a flippant motivation for the entry name containing the non-ascii character "ä".
meriton
Glad it's informative. I made it a comment because I don't have any good ideas for how to solve the problem in Java. So your answer is still the best!
Cheeso
+1  A: 

Miracles indeed happen, and Sun/Oracle did really fix the long-living bug/rfe:

Now it's possible to [set up filename encodings upon creating][1] the zip file/stream (requires Java 7).

[1]: http://download.java.net/jdk7/docs/api/java/util/zip/ZipOutputStream.html#ZipOutputStream(java.io.OutputStream, java.nio.charset.Charset)

Anton S. Kraievoy
Wow, that's great news. It will be a while before our customers can benefit from this, but I'm glad this was not just left as a festering sore.
Jeff