views:

145

answers:

3

Hi! I need to get output of native application under PowerShell. The problem is, output is encoded with UTF-8 (no BOM), which PowerShell does not recognize and just converts those funky UTF chars directly into Unicode.

I've found PowerShell has $OutputEncoding variable, but it does not seem to affect input data.

Good ol' iconv is of no help either, since this unnecessary UTF8-as-if-ASCII => Unicode conversion takes place before the next pipeline member acquires data.

A: 

If your goal is to process data from your native command in powershell, you may try

./program-that-outputs-utf8 > temp.txt
get-content temp.txt -Encoding utf8 | (do_whatever)
Cédric Rup
This does not work. Look, initally powershell decodes all data from program-that-outputs-utf8 as if it were ASCII, effectively giving UTF gibberish (and not real characters that this gibberish represents) in _UNICODE_ strings. Then, if I use ">" operator, it will encode _THAT_ gibberish in UTF-16.
Andy
A: 

Probabry you need to execute "chcp 65001" (after modifying powershell.exe's font).
This command is available with PSISE.

hoge
+3  A: 

I see the issue now with the program below (stdout.cpp - cl stdout.cpp):

#include <stdio.h>

void main()
{
    char bytes[] = { 0x41, 0x53, 0x43, 0x49, 
                     0x49, 0x20, 0x6F, 0x75, 
                     0x74, 0x70, 0x75, 0x74,
                     0xE1, 0xBE, 0xB9};

    for (int i = 0; i < 15; i++)
    {
        printf("%c", bytes[i]);
    }                
}

And running that through | Out-File -enc UTF8 foo.txt gives the gibberish:

PS> fhex foo.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 EF BB BF 41 53 43 49 49 20 6F 75 74 70 75 74 0D ...ASCII output.
00000010 9F E2 95 9B E2 95 A3 0D 0A                      .........

Note that fhex is a PSCX utility.

UPDATE: Figured out how to get this to work:

$enc = [Console]::OutputEncoding
[Console]::OutputEncoding = [text.encoding]::utf8
.\stdout.exe | out-file fubar3.txt -enc utf8
fhex .\fubar3.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 EF BB BF 41 53 43 49 49 20 6F 75 74 70 75 74 E1 ...ASCII output.
00000010 BE B9 0D 0A                                     ....

[Console]::OutputEncoding = $enc
Keith Hill
How simple :) This example shows that in some cases one really needs to know .NET. Just posh knowledge is not enough...
stej