views:

92

answers:

2

The problem I'm having is that I need to sort a whole bunch of char pointers, but they have special characters. I managed to get a sorting procedure like so:

std::sort(dict_.begin(), dict_.end(), comp);

bool comp(NumPair& a, NumPair& b)
{
    return boost::lexicographic_compare(a.pFirst, b.pFirst);
}

This worked great, except that all special german characters were sorted before all the others. My teacher (yes, this is pertaining to a homework assignment), however, wants them to be sorted at the end. Awesome!

So I was playing around and thought I could use a trick I saw on a website to enable a regional locale to include the special characters like so

return boost::lexicographic_compare(a.pFirst, b.pFirst, locale("german"));

Didn't work! So:

bool comp()
{
    setlocale(LC_ALL, "");
    return boost::lexicographic_compare(a.pFirst, b.pFirst);
}

Didn't work!

If you have them, I would love to hear some other ideas that might actually work.

Update:

As requested, some sample input and output:

// Some entries
dict_.push_back( NumPair ( "öffnen", "to open" ) );
dict_.push_back( NumPair ( "überraschen", "to surprise" ) );
dict_.push_back( NumPair ( "wünschen", "to wish, to desire, to want" ) );
dict_.push_back( NumPair ( "widersprechen", "to contradict_" ) );

// NumPair ctor.
NumPair( const char *pFirst, const char *pSecond )
{
    /* Deep copy of pFirst and pSecond */
}

Output after result:

öffnen
überraschen
wünschen
widersprechen
+3  A: 

You might want to show more of your code, like exactly what strings you're using that are causing this problem. I'm easily able to sort a set of German words, and any words beginning with non-ASCII special German characters are ordered at the end. This happens even without any special German locale settings, since in Unicode non-ASCII characters have higher codepoint values than ASCII characters.

For example:

setlocale(LC_ALL, "");

std::vector<std::wstring> vec;
vec.push_back(L"Hallo");
vec.push_back(L"Morgen");
vec.push_back(L"Zebra");
vec.push_back(L"Abend");
vec.push_back(L"Übertragens");
vec.push_back(L"Buchen");

std::sort(vec.begin(), vec.end());
for (std::vector<std::wstring>::iterator it = vec.begin(); it != vec.end(); ++it)
    std::wcout << *it << std::endl;

This outputs:

Abend
Buchen
Hallo
Morgen
Zebra
Übertragens

Note the use of wide character strings. Since lexicographical comparison routines compare character-by-character, you need to use wide characters or else the comparison function will end up comparing the string byte-by-byte instead of character-by-character. This will result in invalid comparisons since not every Unicode character can be stored in a single byte. Special German characters, for example, are 2 bytes in UTF-8, so you need a data type capable of containing the range of 0x00 to 0xFFFF in a single element. On most platforms, wchar_t is sufficient for this.

(Also note that it's not a good practice to include non-ASCII characters in source code. Use "universal character codes" instead. I'm just using non-ASCII source here for clarity.)

Charles Salvia
The name of the locale is platform/implementation specific and differ between most operating systems. en_US.utf8 is a typical glibc locale, but does not exists on windows. You can use "" to construct a locale from your current enviornment.
cytrinox
@Charles: It was my assumption as well that they would be sorted without any kind of additional work. I'm assuming that this is not so because I'm not using wstrings and the use of such is forbidden for the fact that we should learn to do it without them. I've updated my post to include some sample input and output as well as the storage type for the strings.
SoulBeaver
+1  A: 

I'd recommend using CompareString function if you use windows. http://msdn.microsoft.com/en-us/library/dd317759

Locales are very error prone. Also, threading causes issues if you use locales.

Madman