tags:

views:

317

answers:

4

I am trying to compare std::strings in a locale-dependent manner.

For ordinary C-style strings, I've found strcoll, which does exactly what I want, after doing std::setlocale

#include <iostream>
#include <locale>
#include <cstring>

bool cmp(const char* a, const char* b)
{
    return strcoll(a, b) < 0;
}

int main()
{
    const char* s1 = "z", *s2 = "å", *s3 = "ä", *s4 = "ö";

    std::cout << (cmp(s1,s2) && cmp(s2,s3) && cmp(s3,s4)) << "\n"; //Outputs 0
    std::setlocale(LC_ALL, "sv_SE.UTF-8");
    std::cout << (cmp(s1,s2) && cmp(s2,s3) && cmp(s3,s4)) << "\n"; //Outputs 1, like it should

    return 0;
}

However, I'd like to have this behaviour for std::string as well. I could just overload operator< to do something like

bool operator<(const std::string& a, const std::string& b)
{
    return strcoll(a.c_str(), b.c_str());
}

but then I'd have to worry about code using std::less and std::string::compare, so it doesn't feel right.

Is there a way to make this kind of collation work for strings in a seamless manner?

+4  A: 

operator() of std::locale is just what you are searching. To get the current global locale, just use the default constructor.

AProgrammer
That's handy. It makes the standard collections work without effort.
CAdaker
+5  A: 

The C++ library provides the collate facet to do locale-specific collation.

Martin v. Löwis
operator() on locale is the easiest way I know to access it.
AProgrammer
I see - I didn't know that.
Martin v. Löwis
A: 

In C++ you need to use the standard collate facet. Check it out.

dudewat
A: 

After a bit of searching around I realized that one way to do it could be to overload the std::basic_string template to make a new, localized string class.

There is probably a gazillion bugs in this, but as a proof of concept:

#include <iostream>
#include <locale>
#include <string>

struct localed_traits: public std::char_traits<wchar_t>
{
    static bool lt(wchar_t a, wchar_t b)
    {
        const std::collate<wchar_t>& coll =
            std::use_facet< std::collate<wchar_t> >(std::locale());
        return coll.compare(&a, &a+1, &b, &b+1) < 0;
    }

    static int compare(const wchar_t* a, const wchar_t* b, size_t n)
    {
        const std::collate<wchar_t>& coll =
            std::use_facet< std::collate<wchar_t> >(std::locale());
        return coll.compare(a, a+n, b, b+n);
    }
};

typedef std::basic_string<wchar_t, localed_traits> localed_string;

int main()
{
    localed_string s1 = L"z", s2 = L"å", s3 = L"ä", s4 = L"ö";

    std::cout << (s1 < s2 && s2 < s3 && s3 < s4 ) << "\n"; //Outputs 0
    std::locale::global(std::locale("sv_SE.UTF-8"));
    std::cout << (s1 < s2 && s2 < s3 && s3 < s4 ) << "\n"; //Outputs 1

    return 0;
}

Howerver, it doesn't seem to work if you base it on char instead of wchar_t and I have no idea why...

CAdaker
The reason char doesn't work is that it's not using unicode (as in ".UTF-8". You are probably using ISO/IEC 8859-1.
Jonas Byström