I have some text files which I am trying to transform with a Perl script on Windows. The text files look normal in Notepad+, but all the regexes in my script were failing to match. Then I noticed that when I open the text files in NotePad+, the status bar says "UCS-2 Little Endia" (sic). I am assuming this corresponds to the encoding UCS-2LE. So I created "readFile" and "writeFile" subs in Perl, like so:
use PerlIO::encoding;
my $enc = ':encoding(UCS-2LE)';
sub readFile {
my ($fName) = @_;
open my $f, "<$enc", $fName or die "can't read $fName\n";
local $/;
my $txt = <$f>;
close $f;
return $txt;
}
sub writeFile {
my ($fName, $txt) = @_;
open my $f, ">$enc", $fName or die "can't write $fName\n";
print $f $txt;
close $f;
}
my $fName = 'someFile.txt';
my $txt = readFile $fName;
# ... transform $txt using s/// ...
writeFile $fName, $txt;
Now the regexes match (although less often than I expect), but the output contains long strings of Asian-looking characters interspersed with longs strings of the correct text. Is my code wrong? Or perhaps Notepad+ is wrong about the encoding? How should I proceed?