tags:

views:

108

answers:

4

How can i convert from a Unicode path name (LPWSTR) to the ASCII equivalent? The library that gets called understands only c strings.

Edit: Okay, I took the GetShortPathName and the WideCharToMultiByte suggestions and created that piece of code, i tested it with some folders containing Unicode characters in the path and it worked flawlessly:

wlength = GetShortPathNameW(cpy,0,0);
LPWSTR shortp = (LPWSTR)calloc(wlength,sizeof(WCHAR));
GetShortPathNameW(cpy,shortp,wlength);
clength = WideCharToMultiByte(CP_OEMCP, WC_NO_BEST_FIT_CHARS, shortp, wlength, 0, 0, 0, 0);
LPSTR cpath = (LPSTR)calloc(clength,sizeof(CHAR));
WideCharToMultiByte(CP_OEMCP, WC_NO_BEST_FIT_CHARS, shortp, wlength, cpath, clength, 0, 0);
+1  A: 

GetShortPathName() Function

http://msdn.microsoft.com/en-us/library/aa364989%28VS.85%29.aspx

Will give you an equivalent 8.3 filename, pointing to the same file, for use with legacy code.

[EDIT] This is probably the best you can do, although theoretically the 8.3 filenames may contain non-ascii characters, depending on registry setting. In this case, you don't have an easy way of getting the proper char*, and GetShortPathNameA() will not do that either if codepage setting during file creation does not match current setting.

See http://technet.microsoft.com/en-us/library/cc781607%28WS.10%29.aspx about the setting. There's a concensus here (see below) that this case is reasonable to neglect.

Thanks Moron, All, for contribution to this post.

Pavel Radzivilovsky
But isn't the short path LPWSTR too? Perhaps OP is looking for something like WideCharToMultiByte?
Moron
I'm actually looking for the combination of those both.
metafex
@metafex: Perhaps you should edit your question then. This being the accepted answer does not seem to make sense, with the question being what it is currently.
Moron
@moron No, 8.3 path is guaranteed to be ASCII only, 7 bits per byte
Pavel Radzivilovsky
@metafex Actually, you should be able to call the ASCII version. Doesn't the LPCTSTR type resolve to either "const char *" or "const wchar *" depending on the UNICODE (or is it _UNICODE?) macro. Whenever that is the case the function typically has an ASCII and a WIDE version, in this case GetShortPathNameA and GetShortPathNameW. You'll find that GetShortPathName is just a macro defined to one of these depending on the UNICODE macro. If you want the ASCII version even though UNICODE is defined (which it is by default) call GetShortPathNameA with an ASCII string.
torak
If all the OP wants is to have an ASCII string they can safely pass to a lib that doesn't support Unicode, then this is a reasonable solution.
Steven Sudit
@Pavel: while the 8.3 name might contain character codes that are in the ASCII range, if the UNICODE version of the function is called (ie., the input `LPCTSTR` long path parameter is a pointer to wide characters), then the 8.3 short name returned will also be wide characters which will be unsuitable for passing to a legacy function expecting a regular, old `char*`. Some sort of conversion from wide chars to chars will still need to take place (even if that conversion is trivial) or the legacy function will only see the first characters of the 8.3 path.
Michael Burr
@Pavel: Where does it ever say that GetShortPathName returns ASCII encoded strings? Please point me to a reference which promises that.
Moron
And please don't point to GetShortPathNameA.
Moron
@Moron trust me on that part... though I'd happy to see a reference myself. I guess google might help, I'll add it as edit if you find. BTW, despite *A functions are crap, it is actually justified in this case, coz it's exactly the result you want to get.
Pavel Radzivilovsky
@Pavel: Sorry, I cannot trust you on this without a reference. Also, could you please show us complete code, which given a path in LPWSTR (note the W), gives us back a path in LPSTR (note, no W) which is ASCII encoded?
Moron
@Moron: I tried to search and cannot find a documentation reference. However, I have never seen a non-ascii 8.3 path in windows.
Pavel Radzivilovsky
@Pavel: You cannot find it because because it is ANSI. Not ASCII. The conversion that happens depends on the codepage.
Moron
@moron, I think this can't be true because this string is stored on disk for every file, and default codepage setting in the OS can change.
Pavel Radzivilovsky
@Pavel: You are right about the codepage: in fact I guess being independent of the code page is a feature of short path names. Anyway, it is possible for shortnames to have non-ascii characters: http://technet.microsoft.com/en-us/library/cc781607(WS.10).aspx. A registry key setting which probably no one ever touches and is probably 0 by default (have to check a different language machine). So GetShortPathName will likely give you ASCII encoded strings, but it is not 100% guaranteed as you seemed to have claimed. Anyway, GetShortPathName seems to be the _only_ solution out there!
Moron
thanks. editing answer.
Pavel Radzivilovsky
A: 

I think what you're looking for is Encoding.Convert(). Here's a link to MSDN's documentation of the method. Their example code shows the conversion to ASCII.

Eric
That is .net isn't it?
Moron
Yup, it is. Misread the question... Sorry about that.
Eric
A: 

You could try using WideCharToMultiByte to convert LPWSTR to LPSTR.

Moron
-1. This is not conversion to ASCII. You cannot convert Unicode to ASCII without losing data.
Pavel Radzivilovsky
@Pavel: Huh? So what does the following mean: "The library only takes C strings". Of course you could lose information if you change encoding. What is your point, really?
Moron
the point is, that if a library can only take ASCII strings, and it has nothing to do with C or C++ strings, then there's no way to convert and what you should do before every such attempt of conversion is CHECK and report an error, or, your user will fall into a dangerous pitfal of code that compiles and works conditionally, depending on user input.
Pavel Radzivilovsky
@Pavel: The OP mentions both Ascii and C strings, so he needs a C string in the end. This might help in that regard (for instance, after calling GetShortPathName maybe).
Moron
A: 
ShinTakezou
@Pavel Radzivilovsky; I suspect it could be ok; I suspect the problem with wchar_t* and funcs the asker must use is the '\0', when the func try to "read" the wstring char by char instead of wchar by wchar; converting to multibyte, preserve all the wchars, and do not confund char by char scanning; other problems may arise of course, depending on what exactly that function does.
ShinTakezou