tags:

views:

632

answers:

3

I'm working on a PInvoke wrapper for a library that does not support Unicode strings, but does support multi-byte ANSI strings. While investigating FxCop reports on the library, I noticed that the string marshaling being used had some interesting side effects. The PInvoke method was using "best fit" mapping to create a single-byte ANSI string. For illustration, this is what one method looked like:

[DllImport("thedll.dll", CharSet=CharSet.Ansi)]
public static extern int CreateNewResource(string resourceName);

The result of calling this function with a string that contains non-ASCII characters is that Windows finds a "close" character, generally this looks like it ends up being "???". If we pretend that 'a' is a non-ASCII character, then passing "cat" as a parameter would create a resource named "c?t".

If I follow the guidelines in the FxCop rule, I end up with something like this:

[DllImport("thedll.dll", CharSet=CharSet.Ansi, BestFitMapping = false, ThrowOnUnmappableChar = true)]
public static extern int CreateNewResource([MarshalAs(UnmanagedType.LPStr)] string resourceName);

This introduces a change in behavior; now when a character cannot be mapped an exception is thrown. This concerns me because this is a breaking change, so I'd like to try and marshal the strings as multi-byte ANSI but I cannot see a way to do so. UnmanagedType.LPStr is specified to be a single-byte ANSI string, LPTStr will be Unicode or ANSI depending on the system, and LPWStr is not what the library expects.

How would I tell PInvoke to marshal the string as a multibyte string? I see there's a WideCharToMultiByte() API function, could I change the signature to expect an IntPtr to a string I create in unmanaged memory? It seems like this still has many of the problems that the current implementation has (it still might have to drop or substitute characters), so I'm not sure if this is an improvement. Is there another method of marshaling that I'm missing?

A: 

Here is the best way I've found to accomplish this. Instead of marshalling as a string, marshal as a byte[]. Put the responsibility on the caller of the pinvoke function API to convert to a byte array in the most appropriate fashion. Most likely by using one of the Text.Encoding classes.

JaredPar
+1  A: 

ANSI is multi-byte, and ANSI strings are encoded according to the codepage currently enabled on the system. WideCharToMultiByte works the same way as P/Invoke.

Maybe what you're after is conversion to UTF-8. Although WideCharToMultiByte supports this, I don't think P/Invoke does, since it's not possible to adopt UTF-8 as the system-wide ANSI code page. At this point you'd be looking at passing the string as an IntPtr instead, although if you're doing that, you may as well use the managed Encoding class to do the conversion, rather than WideCharToMultiByte.

Tim Robinson
I see you are right; I had been testing with characters way outside my current code page and couldn't think of any multibyte characters that would actually work in my code page. I'm fumbling around to try and find a code page/character combination that I can throw to the function to gain some confidence, but I think you're right.
OwenP
I figured out how to test it: I used an image for Japanese-localized XP that we have and set up some resources with names composed of lots of Japanese characters. This worked great on the Japanese machine but failed miserably on the English machine.I'd *like* things to behave as if I were using Unicode, but from your explanation and my experiments I see that this is impossible, and I'm already getting as close to it as I'll get. I'll just have to wait for the library's maintainers to implement Unicode support.
OwenP
A: 

If you end up having to call WideCharToMultiByte manually, I would get rid of the p/invoke and manually marshal this using WideCharToMultiByte in a C++/CLI wrapper function. Managed C++ is much better at these interop scenarios than C# is.

Though, if this is the only p/invoke you have, it's probably not worth it.

Drew Hoskins