views:

260

answers:

3

How do you sort an array of strings in C++ that will make this happen in this order:

mr Anka

Mr broWn

mr Ceaser

mR donK

mr ålish

Mr Ätt

mr önD

//following not the way to get that order regardeless upper or lowercase and å, ä, ö
//in forloop... 
string handle;
point1 = array1[j].find_first_of(' ');
string forename1(array1[j].substr(0, (point1)));
string aftername1(array1[j].substr(point1 + 1));
point2 = array1[j+1].find_first_of(' ');
string forename2(array1[j+1].substr(0, (point2)));
string aftername2(array1[j+1].substr(point2 + 1));
if(aftername1 > aftername2){
    handle = array1[j];
    array1[j] = array1[j+1];
    array1[j+1] = handle;//swapping
}
if(aftername1 == aftername2){
    if(forname1 > forname2){
        handle = array1[j];
        array1[j] = array1[j+1];
        array1[j+1] = handle;   
    }
}
+5  A: 

As soon as you throw unicode characters into the mix, you have to start thinking about internationalization. Different languages have different sorting rules. For example, in Dutch, "IJ" is considered a single letter and has its own place in the alphabet. I recommend a good Unicode library for doing string lexical comparisons, namely International Components for Unicode: http://site.icu-project.org/

With that, you can simply use the ordinary std::sort with ICU's comparator.

Will
Are they Unicode characters, though - or just an ANSI codepage. He is using "string" after all.
Steve314
Depends on the encoding. If his source files are encoded as UTF-8, the use of `std::string` is just fine, and each occurrence of Äå will be represented with the appropriate sequence of bytes. Börk, börk, börk!
Will
I simply mean you have no reason to assume Unicode. std::wstring would be good evidence that unicode was in use. std::string says little either way.
Steve314
Touché. Whether they're characters in UCS or characters in some single-byte encoding like latin1, the point is, locale-based collation should be used.
Will
@Will: If his source files are UTF-8, you really can't say how Ä is represented. It could be U+00E4 but also U+0061 U+0308. Needless to say, it's a challenge to ensure those sort together.
MSalters
A: 

In the past I've used stricoll to sort names, which compares strings following the current locale. Although this worked for strings in the current locale, this not worked when you are dealing with names from different locales in the same database.

Ismael
A: 

Tables and transformations.

I would first convert the string to either all uppercase or all lowercase:

#include <cctype>
#include <algorithm>
#include <string>

std::string test_string("mR BroWn");
std::transform(test_string.begin(), test_string.end(),
               test_string.begin(),
               std::tolower);

Next I would check for exceptions or use an equivalency table. If the character in question is in an array of exception characters, then use an equivalency table.

Thomas Matthews