tags:

views:

233

answers:

4

Can you use a UTF-8 string as the Arguments for a StartInfo?

I am trying to pass a UTF-8 (in this case a Japanese string) to an application as a console argument.

Something like this (this is just an example! (cmd.exe would be a custom app))

var process = new System.Diagnostics.Process();
process.StartInfo.Arguments = "/K \"echo これはテストです\"";
process.StartInfo.FileName = "cmd.exe";
process.StartInfo.UseShellExecute = true;

process.Start();
process.WaitForExit();

Executing this seems to loose the UTF-8 string and all the target application sees is "echo ?????????"

When executing this command directly on the command line (by pasting the arguments) the target application receives the string correctly even though the command line itself doesn't seem to display it correctly.

Do I need to do anything special to enable UTF-8 support in the arguments or is this just not supported?

+1  A: 

I just created a Windows Forms application which displays the Environment.CommandLine in a RichTextBox, and the string was displayed correctly, so it is possible to pass a Unicode string this way.

I think my OS uses codepage 1252 by default, so I cannot display these characters in the Command Prompt even when pasting the arguments like you did.

deltreme
did you pass the arguments to your app by starting the app using the Process and ProcessStartInfo or directly from the command line?
Patrick Klug
I used Process / ProcessStartInfo - I only changed "cmd.exe" to "test.exe" which was my WinForms app.
deltreme
A: 

The strings used [System.String or plain string] are Unicode based. So, yes they can sustain the above mentioned encoding.

Have a look here

You need to check the OS related settings (codepages, languages etc.)

Nayan
I know that strings support unicode - I am just not sure whether the Arguments property on the ProcessStartInfo properly propagates this to the executing application. It doesn't seem to.
Patrick Klug
+1  A: 

Programs receive their command lines in UTF-16, the same encoding as .NET strings:

Arguments = "/U /K \"echo これはテストです> output.txt\"";

It is the console window that cannot display characters outside of it's current codepage/selected font. However, I am assuming that you don't want to call echo, so this depends entirely on how the program you are calling is written.

Some background info: C or C++ programs that use the 'narrow' (system code page) entry points, eg main(int argc, char** argv), rather than the 'wide' (UTF-16) entry points, wmain(int argc, wchar_t** argv), are called by a stub that converts the commandline to the system codepage - which cannot be UTF-8.

By far the best option is to change the program to use a wide entrypoint, and simply get the same UTF-16 as you had in your .NET string. If that is not possible, then one trick you could try is to pass it a UTF-16 commandline that when converted to the system codepage is UTF-8 for the characters you want it to use:

Arguments = Encoding.Default.GetString(Encoding.UTF8.GetBytes(args));

Caveat Coder: Don't be surprised if this goes horribly wrong on your or someone else's machine, it depends on every possible byte being valid in the current system codepage, the system codepage not being different from when your program was started, the program you are running not using the data to any encoding dependent Windows function (those with A, W suffixed versions), and so on.

Simon Buchan
Yes I can see that programs receive their command lines in unicode but what I don't know is whether the StartInfo.Arguments of the Process class can propagates the value to the application in unicode. In my testing it doesn't seem to.
Patrick Klug
@Patrick: To be completely precise, when a program has been started, the raw value it receives from Windows is always in Unicode. Depending on how it is written, it may have that converted to the system codepage before it sees it.
Simon Buchan
+1  A: 

It completely depends on the program you are trying to start. The Process class fully supports Unicode, as does the operating system. But the program might be old and use 8-bit characters. It will use GetCommandLineA() to retrieve the command line arguments, the ANSI version of the native Unicode GetCommandLineW() API function. And that translates the Unicode string to 8-bit chars using the system default code page as configured in Control Panel + Regional and Language Options, Language for Non-Unicode Programs. WideCharToMultiByte() using CP_ACP.

If that is not the Japanese code page, that translation produces question marks since the Japanese glyphs only have a code in the Japanese code page. Switching the system code page isn't usually very desirable for non-Japanese speakers. Utf8 certainly won't work, the program isn't going to expect them. Consider running this program in a virtual machine.

Hans Passant