views:

42

answers:

3

I am using an org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream to add files coming from a Subversion repository. This works fine as long as I do not use German Umlauts (ä,ö,ü) or any other special characters in the filename. I am wondering what would be the fastest way to make it accept non ASCII chars?

def zip(repo: SVNRepository, out: OutputStream, url: String, resourceList: Seq  
       [SVNResource]) {
  val zout = new ZipArchiveOutputStream(new BufferedOutputStream(out))
  zout.setEncoding("Cp437");
  zout.setFallbackToUTF8(true);
  zout.setUseLanguageEncodingFlag(true);
  zout.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);
  try {
    for (resource <- resourceList) {
      addFileToStream(repo, zout, resource)
    }
  }
  finally {
    zout.finish
    zout.close
  }
}

private def addFileToStream(repo: SVNRepository, zout: ZipArchiveOutputStream, resource:SVNResource): ZipArchiveOutputStream = {
  val entry = resource.entry
  val url = YSTRepo.getAbsolutePath(entry)
  if (FILE == entry.getKind.toString) {
    val file = new File(url)
    val zipEntry = new ZipArchiveEntry(file, url)   
    zout.putArchiveEntry(zipEntry)
    val baos = new ByteArrayOutputStream()
    val fileprops = new SVNProperties()
    repo.getFile(url, -1, fileprops, baos)
    IOUtils.copy(new ByteArrayInputStream(baos.toByteArray), zout)
    zout.closeArchiveEntry
  } else if (DIR == entry.getKind.toString) {
    if (resource.hasChildren) {
      val dirProps = new SVNProperties()
      val entries = repo.getDir(url, -1, dirProps, new java.util.ArrayList[SVNDirEntry])
      for (child <- SVNResource.listDir(repo, entries.toList.asInstanceOf[Seq SVNDirEntry]])) {
        addFileToStream(repo, zout, child)
      }
    }
  }
  zout
}
A: 

You can try passing the filename through URLEncoder first: http://download.oracle.com/javase/6/docs/api/java/net/URLEncoder.html

That will ensure that the zipped filename is pure ASCII

When reading it back out, use URLDecoder to recover the full UFT-8 character set: http://download.oracle.com/javase/6/docs/api/java/net/URLDecoder.html

Kevin Wright
Wouldn't that mean I need control over the extaction process? The zip gets streamed to the user's browser.
getagrip
Yes, it would :)
Kevin Wright
+1  A: 

Based on your comments, it sounds like the real problem is with the Linux unzip program and/or the encoding supported by your Linux filesystem. One solution is to pass the -U option to unzip, which will escape any Unicode characters in filenames.

That said, I also recommend removing the following lines when you write your ZIPfile:

zout.setEncoding("Cp437");
zout.setFallbackToUTF8(true);
zout.setUseLanguageEncodingFlag(true);

And replace them with the following:

zout.setEncoding("UTF-8");

This should result in the highest portability.

Anon
+2  A: 

I solved the issue by setting

UnicodeExtraFieldPolicy.NOT_ENCODEABLE 

to

UnicodeExtraFieldPolicy.ALWAYS

Filenames are now displayed correctly using Linux-Unzip, Windows-Compressed-Folders, IZArc and WINZIP.

getagrip