views:

547

answers:

2

What is the secret to japanese characters in a Windows XP .bat file?

We have a script for open a file off disk in kiosk mode:

@ECHO OFF
"%ProgramFiles%\Internet Explorer\iexplore.exe" –K "%CD%\XYZ.htm"

It works fine when the OS is english, and it works fine for the japanese OS when XYZ is made up of english characters, but when XYZ is made up of japanese characters, they are getting mangled into gibberish by the time IE tries to find the file.

If the batch file is saved as Unicode or Unicode big endian the script wont even run.

I have tried various ways of encoding the japanese characters. ampersand escape does not work (〹)

Percent escape does not work %xx%xx%xx

ABC works, AB%43 becomes AB3 in the error message, so it looks like the percent escape is trying to do parameter substitution. This is confirmed because %043 puts in the name of the script !

One thing that does work is pasting the ja characters into a command prompt.

@ECHO OFF
CD "%ProgramFiles%\Internet Explorer\"
Set /p URL ="file to open: "
start iexplore.exe –K %URL%

This tells me that iexplore.exe will accept and parse the parameter correctly when it has ja characters, but not when they are written into the script.

So it would be nice to know what the secret may be to getting the parameter into IE successfully via the batch file, as opposed to via the clipboard and an environment variable.

Any suggestions greatly appreciated !

best regards

Richard Collins

P.S. another post has has made this suggestion, which i am yet to follow up:

You might have more luck in cmd.exe if you opened it in UNICODE mode. Use "cmd /U".

http://stackoverflow.com/questions/56913/batch-renaming-of-files-with-international-chars-on-windows-xp

I will need to find out if this can be from inside the script.

+1  A: 

First of all: Batch files are pretty limited in their internationalization support. There is no direct way of telling cmd what codepage a batch file is in. UTF-16 is out anyway, since cmd won't even parse that.

I have detailed an option in my answer to the following question:

which might be helpful for your needs.

In principle it boils down to the following:

  • Use an encoding which has single-byte mappings for ASCII
  • Put a chcp ... at the start of the batch file
  • Use the set codepage for the rest of the file

You can use codepage 65001, which is UTF-8 but make sure that your file doesn't include the U+FEFF character at the start (used as byte-order mark in UTF-16 and UTF-32 and sometimes used as marker for UTF-8 files as well). Otherwise the first command in the file will produce an error message.

So just use the following:

echo off
chcp 65001
"%ProgramFiles%\Internet Explorer\iexplore.exe" –K "%CD%\XYZ.htm"

and save it as UTF-8 without BOM (Note: Notepad won't allow you to do that) and it should work.


cmd /u won't do anything here, that advice is pretty much bogus. The /U switch only specifies that Unicode will be used for redirection of input and output (and piping). It has nothing to do with the encoding the console uses for output or reading batch files.


URL encoding won't help you either. cmd is hardly a web browser and outside of HTTP and the web URL encoding isn't exactly widespread (hence the name). cmd uses percent signs for environment variables and arguments to batch files and subroutines.

"Ampersand escape" also known as character entities known from HTML and XML, won't work either, because cmd is also not HTML or XML. The ampersand is used to execute multiple commands in a single line.

Joey
my example of ampersand escape did not make it thru mark down - i meant "ampersand hash nnnnn semicolon", which probably has a better name than ampersand escape.and yes it was wishful thinking that either this or URL encoding would work in a batch file.
Richard Collins
No, it doesn't :-) Most escaping technologies don't go beyond what they were designed for and each technology has different methods. `cmd` uses the circumflex accent (`^`) as escape character but doesn't provide any way of inserting arbitrary characters. It can deal with Unicode fine, but usually not from inside batch files themselves.
Joey
Thanks for your input Johannes, but I could not get the above suggestions to work on the Japanese OS.
Richard Collins
Alternatively try one of the Japanese code pages instead. They don't work on my machine, though, since I have an English version of Windows and therefore little need for handling that.
Joey
A: 

For the record, a simple answer has been found for this question.

If the batch file is saved as ANSI - it works !

Richard Collins
Erm, just one question: What were you trying to save it in before?
Joey
our software wrote it as UTF-B by default. DotNet will write it out as ANSI by adding a System.Text.Encoder.Default parameter to the stream reader and writer constructors.
Richard Collins
erm that would be utf-8 :)
Richard Collins