views:

555

answers:

1

I'm not talking about reading in the file content in utf-8 or non-utf-8 encoding and stuff. It's about file names. Usually I save my Perl script in the system default encoding, "GB2312" in my case and I won't have any file open problems. But for processing purposes, I'm now having some Perl script files saved in utf-8 encoding. The problem is: these scripts cannot open the files whose names consist of characters encoded in "GB2312" encoding and I don't like the idea of having to rename my files.

Does anyone happen to have any experience in dealing with this kind of situation? Thanks like always for any guidance.

Edit

Here's the minimized code to demonstrate my problem:

# I'm running ActivePerl 5.10.1 on Windows XP (Simplified Chinese version)
# The file system is NTFS

#!perl -w
use autodie;

my $file = "./测试.txt"; #the file name consists of two Chinese characters
open my $in,'<',"$file";

while (<$in>){
print;
}

This test script can run well if saved in "ANSI" encoding (I assume ANSI encoding is the same as GB2312, which is used to display Chinese charcters). But it won't work if saved as "UTF-8" and the error message is as follows:

Can't open './娴嬭瘯.txt' for reading: 'No such file or directory'.

In this warning message, "娴嬭瘯" are meaningless junk characters.

Update

I tried first encoding the file name as GB2312 but it does not seem to work :( Here's what I tried:

#!perl -w
use autodie;
use Encode;

my $file = "./测试.txt";
encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";

while (<$in>){
print;
}

My current thinking is: the file name in my OS is 测试.txt but it is encoded as GB2312. In the Perl script the file name looks the same to human eyes, still 测试.txt. But to Perl, they are different because they have different internal representations. But I don't understand why the problem persists when I already converted my file name in Perl to GB2312 as shown in the above code.

Update

I made it, finally made it :)

@brian's suggestion is right. I made a mistake in the above code. I didn't give the encoded file name back to the $file.

Here's the solution:

#!perl -w
use autodie;
use Encode;

my $file = "./测试.txt";
$file = encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";

while (<$in>){
print;
}
+5  A: 

If you

 use utf8;

in your Perl script, that merely tells perl that the source is in UTF-8. It doesn't affect how perl deals with the outside world. Are you turning on any other Perl Unicode features?

Are you having problems with every filename, or just some of them? Can you give us some examples, or a small demonstration script? I don't have a filesystem that encodes names as GB2312, but have you tried encoding your filenames as GB2312 before you call open?

If you want specific strings encoded with a specific encoding, you can use the Encode module. Try that with your filenames that you give to open.

brian d foy
@brian, thanks for the answer. Can I let Perl first convert the GB2312 encoded file name as UTF-8 so that it can recognize the file name? I know how to encode the non-utf-8 encoded file content as utf-8, but didn't thought of doing it with the file name.
Mike
@brian, thanks! I finally solved the problem. You're completely right! The solution is exactly as you foresaw: encode the filenames as GB2312 before calling open.
Mike