views:

11507

answers:

10

Hello, I have looked around, but I can't seem to find any examples of this. I am relatively new to C++, and was wondering how I could convert a string to upper case. The examples I have found from googling only have to deal with char's, and not with strings.

A: 

try the toupper() function (#include <ctype.h>). it accepts characters as arguments, strings are made up of characters, so you'll have to iterate over each individual character that when put together comprise the string

zmf
+25  A: 
#include <algorithm>
#include <string>

std::string str = "Hello World";
std::transform(str.begin(), str.end(),str.begin(), ::toupper);
Pierre
Actually, `toupper()` can be implemented as a macro. This may cause an issue.
dirkgently
Good point dirk (unfortunately). Otherwise I think this is certainly the cleanest and clearest way.
j_random_hacker
I haven't checked, but doesn't C++ require that these functions are implemented as actual functions, even when C allowed them to be macros?
jalf
I believe C required there to be functions also, in case you wanted to take the address of the function or whatever, but I don't have a reference handy.
David Thornley
Updated my post with a quote from the recent draft. This solution has two perils -- so please beware.
dirkgently
@Pierre: Please correct your post (and the quotes too).
dirkgently
a bind(::toupper, construct<unsigned char>(_1)) with boost.lambda will serve perfectly fine i think.
Johannes Schaub - litb
i've corrected the quotes, thinking that's quite non-controversial.
Johannes Schaub - litb
You can easily guarantee that toupper() won't be called as a macro.Look here, at the end of the 9.1.1 subsection: http://publications.gbdirect.co.uk/c_book/chapter9/introduction.html.
Bastien Léonard
This approach works fine for ASCII, but fails for multi-byte character encodings, or for special casing rules like German 'ß'.
dan04
+5  A: 
struct convert {
   void operator()(char& c) { c = toupper((unsigned char)c); }
};

// ... 
string uc_str;
for_each(uc_str.begin(), uc_str.end(), convert());

Note: A couple of problems with the top solution:

21.5 Null-terminated sequence utilities

The contents of these headers shall be the same as the Standard C Library headers , , , , and [...]

  • Which means that the cctype members may well be macros not suitable for direct consumption in standard algorithms.

  • Another problem with the same example is that it does not cast the argument or verify that this is non-negative; this is especially dangerous for systems where plain char is signed. (The reason being: if this is implemented as a macro it will probably use a lookup table and your argument indexes into that table. A negative index will give you UB.)

dirkgently
The normal cctype members are macros. I remember reading that they also had to be functions, although I don't have a copy of the C90 standard and don't know if it was explicitly stated or not.
David Thornley
they have to be functions in C++ - even if C allows them to be macros. i agree with your second point about the casting though. the top solution could pass negative values and cause UB with that. that's the reason i didn't vote it up (but i didn't vote it down either) :)
Johannes Schaub - litb
@litb: Can you cite a reference, I couldn't find anything to that effect in the standard.
dirkgently
standard quote must not be missing: 7.4.2.2/1 (poor litb, that's referencing a C99 TC2 draft only), and C++ 17.4.1.2/6 in the glory c++98 standard.
Johannes Schaub - litb
(note the foot-note to it: "This disallows the common practice of providing a masking macro.... blah blupp .. only way to do it in C++ is to provide a extern inline function.") :)
Johannes Schaub - litb
@litb: Footnotes are not part of the normative text, are they? I have had this confusion :P
dirkgently
you are right, they are not part of the normative text :) but they describe the intent of their authors of course. which means if my cited text isn't really making sure there must not be macros, another paragraph will make it sure. hold on i'll see whether i find it.
Johannes Schaub - litb
well but even if the note has no backing normative text, then there will still be a ::toupper function (beside the macro), because of that normative text i cited. since ::tupper will not be replaced by that macro (parens for the arguments are missing), it will work nicely, the same as in C :)
Johannes Schaub - litb
hmm, i think i quoted the paragraph wrongly. It seems that when it talks about "Standard C++ Library", it means only those "cname" and "name" headers, but excludes those "name.h" headers, which it refers to by "Standard C Library". so ctype.h is not at all affected by that rule. :)
Johannes Schaub - litb
However, D.5/1 seems to contradict. It says "For compatibility with the Standard C library, the C++ Standard library provides the 18 C headers, as shown in Table 100:" this looks like a defect i think. i'll report it.
Johannes Schaub - litb
@litb: Thanks for taking the trouble. Are you co-consulting with C99?
dirkgently
Johannes Schaub - litb
... that's achieved by this trickery: http://stackoverflow.com/questions/650461/what-are-some-tricks-i-can-use-with-macros/650711#650711
Johannes Schaub - litb
Actually, in order to force a function call we need to write (toupper) instead of just toupper in the transform
dirkgently
A: 

not sure there is a built in function. Try this:

Include either the ctype.h OR cctype libraries, as well as the stdlib.h as part of the preprocessor directives.

string StringToUpper(string strToConvert)
{//change each element of the string to upper case
   for(unsigned int i=0;i<strToConvert.length();i++)
   {
      strToConvert[i] = toupper(strToConvert[i]);
   }
   return strToConvert;//return the converted string
}

string StringToLower(string strToConvert)
{//change each element of the string to lower case
   for(unsigned int i=0;i<strToConvert.length();i++)
   {
      strToConvert[i] = tolower(strToConvert[i]);
   }
   return strToConvert;//return the converted string
}
Brandon Stewart
+21  A: 

Boost string algorithms:

#include <boost/algorithm/string.hpp>
#include <string>

std::string str = "Hello World";

boost::to_upper(str);

std::string newstr = boost::to_upper_copy("Hello World");
Tony Edgecombe
That was useful.
quant_dev
This also has the benefit of i18n, where `::toupper` is most likely assumes ASCII.
Ben Straub
+3  A: 
typedef std::string::value_type char_t;

char_t up_char( char_t ch )
{
    return std::use_facet< std::ctype< char_t > >( std::locale() ).toupper( ch );
}

std::string toupper( const std::string &src )
{
    std::string result;
    std::transform( src.begin(), src.end(), std::back_inserter( result ), up_char );
    return result;
}

const std::string src  = "test test TEST";

std::cout << toupper( src );
bb
wouldnt recommend a back_inserter as you already know the length; use std::string result(src.size()); std::transform( src.begin(), src.end(), result.begin(), up_char );
Viktor Sehr
Altough I am sure you know this.
Viktor Sehr
+9  A: 

Do you have ASCII or International characters in strings?

If it's the latter case, "uppercasing" is not that simple, and it depends on the used alphabet. There are bicameral and unicameral alphabets. Only bicameral alphabets have different characters for upper and lower case. Also, there are composite characters, like Latin capital letter 'DZ' (\u01F1 'DZ') which use the so called title case. This means that only the first character (D) gets changed.

I suggest you look into ICU, and difference between Simple and Full Case Mappings. This might help:

http://userguide.icu-project.org/transforms/casemappings

Milan Babuškov
Or the German eszet (sp?), the thing that looks like the Greek letter beta, and means "ss". There is no single German character that means "SS", which is the uppercase equivalent. The German word for "street", when uppercased, gets one character longer.
David Thornley
Another special case is the Greek letter sigma (Σ), which has *two* lowercase versions, depending on whether it's at the end of a word (ς) or not (σ). And then there are language specific rules, like Turkish having the case mapping I↔ı and İ↔i.
dan04
A: 

In all the machines I tested, it was faster. Perhaps because he is not concerned with a very wide range of characters. Or because using switch() it makes a jump table, do not know how it works in the assembly ... just know that is faster :P

string Utils::String::UpperCase(string CaseString) {
    for (unsigned short i = 0, tamanho = CaseString.length(); i < tamanho; i++) {
        switch (CaseString[i]) {
            case 'a':
                CaseString[i] = 'A';
                break;
            case 'b':
                CaseString[i] = 'B';
                break;
            case 'c':
                CaseString[i] = 'C';
                break;
            case 'd':
                CaseString[i] = 'D';
                break;
            case 'e':
                CaseString[i] = 'E';
                break;
            case 'f':
                CaseString[i] = 'F';
                break;
            case 'g':
                CaseString[i] = 'G';
                break;
            case 'h':
                CaseString[i] = 'H';
                break;
            case 'i':
                CaseString[i] = 'I';
                break;
            case 'j':
                CaseString[i] = 'J';
                break;
            case 'k':
                CaseString[i] = 'K';
                break;
            case 'l':
                CaseString[i] = 'L';
                break;
            case 'm':
                CaseString[i] = 'M';
                break;
            case 'n':
                CaseString[i] = 'N';
                break;
            case 'o':
                CaseString[i] = 'O';
                break;
            case 'p':
                CaseString[i] = 'P';
                break;
            case 'q':
                CaseString[i] = 'Q';
                break;
            case 'r':
                CaseString[i] = 'R';
                break;
            case 's':
                CaseString[i] = 'S';
                break;
            case 't':
                CaseString[i] = 'T';
                break;
            case 'u':
                CaseString[i] = 'U';
                break;
            case 'v':
                CaseString[i] = 'V';
                break;
            case 'w':
                CaseString[i] = 'W';
                break;
            case 'x':
                CaseString[i] = 'X';
                break;
            case 'y':
                CaseString[i] = 'Y';
                break;
            case 'z':
                CaseString[i] = 'Z';
                break;
        }
    }
    return CaseString;
}
osmano807
What advantage does this code have over the other solutions posted?
Konrad Rudolph
=) It indeeds does the job, but I'd say its a strange coding style.
Viktor Sehr
In all the machines I tested, it was faster. Perhaps because he is not concerned with a very wide range of characters.Or because using switch() it makes a jump table, do not know how it works in the assembly ... just know that is faster :P
osmano807
It seems that here only accept simple answers ... I made this code to the raw performance, and works well for this use.
osmano807
I think this is a case of sacrificing memory for speed. However, don't reinvent the wheel - in fact that code can be shorted to a couple of lines by just adding 32 to the character, assuming you are dealing with the English alphabet. Which a single addition would be infinitely faster than your solution. I won't up or downvote. Brush up on your coding skills a little bit, not saying what you put is a bad thing but I have seen that coding style many times with the CS students in college and it certainly isn't the best.
Nathan Adams
A: 
//works for ASCII -- no clear advantage over what is already posted...

std::string toupper(const std::string & s)
{
    std::string ret(s.size(), char());
    for(unsigned int i = 0; i < s.size(); ++i)
        ret[i] = (s[i] <= 'z' && s[i] >= 'a') ? s[i]-('a'-'A') : s[i];
    return ret;
}
David