tags:

views:

33

answers:

4

I'm struggling to deliver an project to a client. The job is to package files into an archive; simple, right? Well, the files have (and must have) french characters in their names. I'm archiving from the linux command line, she's opening from the desktop on windows.

At first, I tried 'zip', and it didn't work out. Character support appears to vary by implementation from what I've read here on StackOverflow. While unpacking, the resulting files didn't look right to me (Ubuntu Archive Manager) or to her ( WinZip, Windows ).

We next tried tar. Finally, things appear normal for me, but still not ok to the client ( trying PeaZip and 7zip for Windows).

Going into this, I really didn't expect this to be a problem. French speaking computer users must archive things, what are they using?

Any insight or assistance with this would be greatly appreciated. Thanks!

+1  A: 

Try using an archive program that allows you to specify the character encoding (say, UTF-8), or figuring out how to do it with the one you have. This forum thread might help you, because it's similar to what you're asking, albeit in reverse and for German rather than French: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172

JAB
That thread mentioned p7zip, which I didn't know existed. Using the 7z appears to be the winner. It handled those characters perfectly.
bibby
I often work with archives from non-English systems as well (not for work but for my own interests), so I know how annoying mangled filenames can be. Glad to help.
JAB
+3  A: 

ZIP traditionally encodes filenames using IBM437 encoding. However to my knowledge, many tools (incorrectly) tend to use the default encoding on the system, which will likely cause problems in such a situation, because both ends might use different encodings.

In theory ZIP also supports UTF-8 by now, which should resolve these problems, but again tool-support will be the problem. For example as far as I know the ZIP support of Windows Explorer won't be able to handle UTF-8 encoded filenames.

So we end up with this: both ends have to agree about the encoding used for filenames and you will need an encoding that supports all the characters you have (any Unicode encoding will be fine, I'm not sure about IBM437 though). ZIP came a long way and thus there are many tools which tend to disagree about encoding. If possible, explicitly specify the encoding to use and prefer Unicode. In terms of compatibility with arbitrary tools you might be better off, using a newer format that is designed with Unicode in mind.

7-Zip supports it since 4.58 beta, according to the change log, but will only use it, when the local code page doesn't support the required characters. Using the -mcu command line switch will use UTF-8 for anything but ASCII. The local encodings usually differ only on the non-ASCII character range, so this will most likely do the trick. That is, if the tool used for unpacking also supports UTF-8 (which is more likely for 7-ZIP than for ZIP, because it isn't as old as ZIP and there are fewer unpacking tools).

WinRAR might also be worth a try.

Gnafoo
A: 

Alternatively... You could nuke the accented characters. If francophones are on the receiving end of the file transfer, they may or may not be sympathetic (ask your users!).

French doesn't have that all many accents to worry about, really. You have [ae]-grave, e-aigue, [aeiou]-circumflex and c-cedilla to worry about, capital and lower (though that's more likely for the grave and aigue ones, unless someone hit the capslock key)

Tar has a --transform option. If you create a sed pattern to turn every iso-latin-1 accented aeiou and c character to the unaccented versions, you'll probably be okay.

Jason
yea, the client had rejecting that. thanks, though.
bibby
Make sure you have the latest version of Zip available on the destination system. Generally newer archival applications have added features to handle things like I18n and other more 'esoteric' concerns. It's entirely possible that a later version has UTF-8 support in it.
Jason
A: 

I think you should go with compression in 7z format. Under Linux it can be done using PeaZip, or by installing p7zip and using it through an UI like Ark or Filereoller depending on your desktop (I prefer PeaZip because it can be used on any desktop). 7z format was designed ground up with UTF8 in mind (the author is Russian), and in my exeperience it never failed.