views:

213

answers:

3

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.

I wrote a little test script to test it and the output comes out all warbled:

$file = shift;

open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
    print $_;
    if (/Invalid|invalid|Cannot|cannot/) {
        push(@invalids, $file);
        print "invalid file - $inputfile - schedule for retry\n";
        last;
    }            
}

Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file.

I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.

Edit: Using perl v5.8.8 Edit: Hex dump:

file name: Admin_CI.User.sql.results
mime type: 

0000-0010:  ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00  ..1.>... 2.>...M.
0000-0020:  73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00  s.g...1. 5.0.0.7.
0000-0030:  2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00  ,...L.e. v.e.l...
0000-0032:  31 00                                            1.
+1  A: 

Try opening the file with an IO layer specified, e.g. :

open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";

See perldoc open for more on this.

eugene y
+7  A: 

The file is presumably in UCS2-LE (or UTF-16 format).

C:\Temp> notepad test.txt

C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

When opening such file for reading, you need to specify the encoding:

#!/usr/bin/perl

use strict; use warnings;

my ($infile) = @ARGV;

open my $in, '<:encoding(UCS-2le)', $infile
    or die "Cannot open '$infile': $!";

Note that the fffe at the beginning is the BOM.

Sinan Ünür
That was exactly what I was looking for when asking for the dump. :)
JUST MY correct OPINION
Thanks - it was actually UTF-16.
Jaco Pretorius
UCS-2le is very, very close to UTF-16: http://en.wikipedia.org/wiki/UTF-16/UCS-2
Robert P
+4  A: 

The answer is in the documentation for open, which also points you to perluniintro. :)

brian d foy