tags:

views:

825

answers:

4

I am trying to compare two formats that I expected would be somewhat compatible, since they are both generally strings. I have tried to perform strcmp with a string and std::wstring, and as I'm sure C++ gurus know, this will simply not compile. Is it possible to compare these two types? Is there an easy conversion here?

+1  A: 

Convert your wstring to a string.

wstring a = L"foobar";
string  b(a.begin(),a.end());

Now you can compare it to any char* using b.c_str() or whatever you like.

char c[] = "foobar";
cout<<strcmp(b.c_str(),c)<<endl;
Jacob
Sorry about the previous answer, I've changed it.
Jacob
It's likely better to go other way (i.e. `char*` -> `wstring`), since there's less chance to lose data - you can use raw pointers into string as iterators. But otherwise the method is the same, and using constructors is better than other answer that uses `copy`. Caveat is the same: this may not work correctly for all locales.
Pavel Minaev
+1  A: 

First of all you have to ask yourself why you are using std::wstring which is a unicode format with char* (cstring) which is ansi. It is best practice to use unicode because it allows your application to be internationalized, but using a mix doesn't make much sense in most cases. If you want your cstrings to be unicode use wchar_t. If you want your STL strings to be ansi use std::string.

Now back to your question.

The first thing you want to do is convert one of them to match the other datatype.

std::string an std::wstring have the c_str function

here are the function definitions

const char* std::string::c_str() const
const wchar_t* std::wstring::c_str() const

I don't remember off hand how to convert char * to wchar_t * and vice versa, but after you do that you can use strcmp. If you google you'll find a way.

You could use the functions below to convert std::wstring to std::string then c_str will give you char * which you can strcmp

#include <string>
#include <algorithm>

// Prototype for conversion functions
std::wstring StringToWString(const std::string& s);
std::string WStringToString(const std::wstring& s);

std::wstring StringToWString(const std::string& s)
{
std::wstring temp(s.length(),L' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp; 
}


std::string WStringToString(const std::wstring& s)
{
std::string temp(s.length(), ' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp; 
}
Ryu
This will only work if multibyte and widechar encodings for a given locale are "compatible" - e.g. if multibyte is really just ASCII or Latin-1, and widechar is Unicode. This won't work if multibyte is e.g. CP1251.
Pavel Minaev
This is why I like stackoverflow. If you go to some random google result you might get the wrong answer.
Ryu
+2  A: 

The quick and dirty way is

if( std::wstring(your_char_ptr_string) == your_wstring)

I say dirty because it will create a temporary string and copy your_char into it. However, it will work just fine as long as you are not in a tight loop.

Note that wstring uses 16 bit characters (i.e unicode - 65536 possible characters) whereas char* tends to be 8 bit characters (Ascii, Latin english only). They are not the same, so wstring-->char* might loose accuracy.

-Tom

Tom Leys
This looks better than my idea - for some reason, I thought std::wstring wouldn't have the right conversions. My approach creates two extra objects - one named (and probably heavier than a simple wstring), the other a temporary wstring instance.
Steve314
`std::wstring` does not have any constructor from `const char*`.
Pavel Minaev
You can build a wstring from char* the underlying types are different. wstring uses wchar_t (which may be 32 bytes not just 16 on some systems).
Martin York
You can build it, but nonetheless code as given will not even compile.
Pavel Minaev
Yup, would need to be `std::wstring(pcYourString, pcYourString+strlen(pcYourString))` to compile. Ugly, but works IFF `wchar_t` is Unicode/UTF16/UTF32 and `char*` is ASCII or ISO8859-1. It won't work for the rather popular ISO-8859-15 (the €-variant of ISO-8859)
MSalters
+6  A: 

You need to convert your char* string - "multibyte" in ISO C parlance - to a wchar_t* string - "wide character" in ISO C parlance. The standard function that does that is called mbstowcs ("Multi-Byte String To Wide Character String")

NOTE: as Steve pointed out in comments, this is a C99 function and thus is not ISO C++ conformant, but may be supported by C++ implementations as an extension. MSVC and g++ both support it.

It is used thus:

const char* input = ...;

std::size_t output_size = std::mbstowcs(NULL, input, 0); // get length
std::vector<wchar_t> output_buffer(output_size);

// output_size is guaranteed to be >0 because of \0 at end
std::mbstowcs(&output_buffer[0], input, output_size);

std::wstring output(&output_buffer[0]);

Once you have two wstrings, just compare as usual. Note that this will use the current system locale for conversion (i.e. on Windows this will be the current "ANSI" codepage) - normally this is just what you want, but occasionally you'll need to deal with a specific encoding, in which case the above won't do, and you'll need to use something like iconv.

EDIT

All other answers seem to go for direct codepoint translation (i.e. the equivalent of (wchar_t)c for every char c in the string). This may not work for all locales, but it will work if e.g. your char are all ASCII or Latin-1, and your wchar_t are Unicode. If you're sure that's what you really want, the fastest way is actually to avoid conversion altogether, and to use std::lexicographical_compare:

#include <algorithm>

const char* s = ...;
std::wstring ws = ...;

const char* s_end = s + strlen(s);

bool is_ws_less_than_s = std::lexicographical_compare(ws.begin, ws.end(),
                                                      s, s_end());
bool is_s_less_than_ws = std::lexicographical_compare(s, s_end(),
                                                      ws.begin(), ws.end());
bool is_s_equal_to_ws = !is_ws_less_than_s && !is_s_less_than_ws;

If you specifically need to test for equality, use std::equal with a length check:

#include <algorithm>

const char* s = ...;
std::wstring ws = ...;

std::size_t s_len = strlen(s);
bool are_equal =
    ws.length() == s_len &&
    std::equal(ws.begin(), ws.end(), s);
Pavel Minaev
Is this C++? It seems to be C99, and I'm not sure what versions of the C library were merged into the C++ standard. Upvoted anyway - should work in practice either way.
Steve314
Yeah, `mbstowcs` is C99, though in practice both C++ implementations I'm familiar with - MSVC and g++ - support this function.
Pavel Minaev
It seems that the 100% portable ISO C++ approach would be to use the `std::codecvt<wchar_t, char, std::mbstate_t>` facet and its `in()` method, but it is just so messy and verbose... http://msdn.microsoft.com/en-us/library/xse90h58.aspx - documentation for it in case anyone wants to try to write up a detailed answer for that.
Pavel Minaev
@Pavel - if it wasn't messy and verbose, who would recognise it as a genuine C++ standard library thing?
Steve314
+1 Looks like some good solutions in here, great examples.
Tom Leys