views:

459

answers:

2

Hello!

I'm trying to delete some files with unicode characters in them with batch script (it's a requirement). So I run cmd and execute:

> chcp 65001

Effectively setting codepage to UTF-8. And it works:

D:\temp\1>dir
 Volume in drive D has no label.
 Volume Serial Number is 8C33-61BF

 Directory of D:\temp\1

02.02.2010  09:31    <DIR>          .
02.02.2010  09:31    <DIR>          ..
02.02.2010  09:32               508 1.txt
02.02.2010  09:28                12 delete.bat
02.02.2010  09:20                95 delete.cmd
02.02.2010  09:13    <DIR>          Rún
02.02.2010  09:13    <DIR>          Гуцул Каліпсо
               3 File(s)            615 bytes
               4 Dir(s)  11 576 438 784 bytes free

D:\temp\1>rmdir Rún

D:\temp\1>dir
 Volume in drive D has no label.
 Volume Serial Number is 8C33-61BF

 Directory of D:\temp\1

02.02.2010  09:56    <DIR>          .
02.02.2010  09:56    <DIR>          ..
02.02.2010  09:32               508 1.txt
02.02.2010  09:28                12 delete.bat
02.02.2010  09:20                95 delete.cmd
02.02.2010  09:13    <DIR>          Гуцул Каліпсо
               3 File(s)            615 bytes
               3 Dir(s)  11 576 438 784 bytes free

Then I put the same rmdir commands in batch script and save it in UTF-8 encoding. But when I run nothing happens, literally nothing: not even echo works from batch script in this case. Even saving script in OEM encoding does not help.

So it seems that when I change codepage to UTF-8 in console, scripts just stop working. Does somebody know how to fix that?

A: 

The Unicode support in console, and especially in batch files, is pretty bad. Can you "twist" the requirement to say PowerShell or Active Scripting (VBScript or JScript)?

It will save you a lot of grief in the long run (if you need to grow this beyond this simple task)

Not to mention that both PowerShell and ActiveScripting use way more powerful languages, allowing for functions, proper loops, real variables, debuggers, a lot of goodies for a more serious project.

Mihai Nita
Yes, I can. I just wanted to find out, is it possible to solve this bug (actually, it's a bug, isn't it) in a direct way...
Andy
A: 

If you want to have unicode supported in batch file, then CHCP on a line by itself just aborts the batch file. What I suggest is putting CHCP on each batch file line that needs unicode as follows

chcp 65001 > nul && <real command here>

Example: In my case I wanted to have a nice TAIL of my log files while debugging, but the content for even Latin-1 characters was being messed up. So here is my batch file which wraps the real tail implementation from Windows Resource Kit.

@C:\WINDOWS\system32\chcp.com 65001 >nul && tail.exe -f %1

In addition, for output to a console, you need to set a true type font, i.e. Lucidia Console.

And apparently for output to a file the command line needs to run as Unicode, so you would kick off your batch script as follows

cmd /u /c <batch file command here>

Disclaimer: Tested on Windows XP sp3 with Windows Resource Kit.

Jennifer Zouak