views:

1183

answers:

3

I'm trying to use the tree command in a windows commandline to generate a text file listing the contents of a directory but when I pipe the output the unicode characters get stuffed up.

Here is the command I am using:

tree /f /a > output.txt

The results in the console window are fine:

\---Erika szobája
        cover.jpg
        Erika szobája.m3u
        Kátai Tamás - 01 Télvíz.ogg
        Kátai Tamás - 02 Zölderdõ.ogg
        Kátai Tamás - 03 Renoir kertje.ogg
        Kátai Tamás - 04 Esõben szaladtál.ogg
        Kátai Tamás - 05 Ázik az út.ogg
        Kátai Tamás - 06 Sûrû völgyek takaród.ogg
        Kátai Tamás - 07 Õszhozó.ogg
        Kátai Tamás - 08 Mécsvilág.ogg
        Kátai Tamás - 09 Zúzmara.ogg

But the text file is no good:

\---Erika szob ja
        cover.jpg
        Erika szob ja.m3u
        K tai Tam s - 01 T‚lv¡z.ogg
        K tai Tam s - 02 Z”lderdä.ogg
        K tai Tam s - 03 Renoir kertje.ogg
        K tai Tam s - 04 Esäben szaladt l.ogg
        K tai Tam s - 05 µzik az £t.ogg
        K tai Tam s - 06 S–r– v”lgyek takar¢d.ogg
        K tai Tam s - 07 åszhoz¢.ogg
        K tai Tam s - 08 M‚csvil g.ogg
        K tai Tam s - 09 Z£zmara.ogg

How can I fix this? Ideally the text file would be exactly the same as the output in the console window.

I tried Chris Jester-Young's suggestion (what happened, did you delete it Chris?) of running the command line with the /U switch, it looked like exactly what I needed but it does not appear to work. I have tried opening the file in both VS2008 and notepad and both show the same incorrect characters.

+3  A: 

If you output as non-Unicode (which you apparently do), you have to view the text file you create using the same encoding the Console window uses. That's why it looks correct in the console. In some text editors, you can choose an encoding (or "code page") when you open a file. (How to output as Unicode I don't know. cmd /U doesn't do what the documentation says.)

The Console encoding depends on your Windows installation. For me, it's "Western European (DOS)" (or just "MS-DOS") in Microsoft Word.

bzlm
How do I determine the encoding the console windows uses? I tried opening in word and it gave me the choice of encodings, I checked some of the more obvious ones and none of them looked right.
Paul Batum
I answered this within the answer above.
bzlm
Thanks bzim, you were right, opening in msword with "ms-dos" encoding works fine. Next step is for my program to parse the file, should be just a simple matter of selecting the equivalent encoding. Cheers!
Paul Batum
+2  A: 

I decided I had to have a look at tree.com and figure out why it's not respecting the Unicode setting of the console. It turns out that (like many of the command-line file utilities), it uses a library called ulib.dll to do all the printing (specifically, TREE::DisplayName calls WriteString in ulib).

Now, in ulib, the WriteString method is implemented in two classes, SCREEN and STREAM. The SCREEN version uses WriteConsoleW directly, so all the Unicode characters get correctly displayed. The STREAM version converts the Unicode text to one of three different encodings (_UseConsoleConversions ⇒ console codepage (GetConsoleCP), _UseAnsiConversions ⇒ default ANSI codepage, otherwise ⇒ default OEM codepage), and then writes this out. I don't know how to change the conversion mode, and I don't believe the conversion can be disabled.

I've only looked at this briefly, so perhaps more adventurous souls can speak more about it! :-)

Chris Jester-Young
Actually, "tree.com" not respecting the settings of cmd.exe looks like it's by design. In the documentation for cmd.exe, it specifically states that *internal* commands will have Unicode output. However, when I tried this with "dir", I got the same results. And "dir" is an internal command. Right?
bzlm
Heh, after seeing /u not work, I'd be the last one to refer to the documentation for anything. :-P I'll see if I can spend some time on IDA figuring out what dir does in Unicode mode....
Chris Jester-Young
Well, just saying the docs were right on this particular point. =]To me, it looks like a garble of UTF-16 and the console encoding, since a file named "hö" will appear in redirected output from "dir" thus:00 <ascii for h> 00 <console encoding code for "ö">
bzlm
*nods* Sure. But ouch, with the UTF-16 representation of console codes, as opposed to Unicode code points! That's seriously wrong....
Chris Jester-Young
A: 

The short answer is you cannot and this is because tree.com is an ANSI application, even on Windows 7.

The only solution is to write your own tree implementation. Also you could file a bug to Microsoft, but I doubt they are not already aware about it.

Sorin Sbarnea