The C standard says that the character constants such as 'ç' are integer constants:
§6.4.4.4/9
An integer character constant has type int. The value of an integer character constant
containing a single character that maps to a single-byte execution character is the
numerical value of the representation of the mapped character interpreted as an integer.
If the char type is signed on your machine (it is on Linux), then when comando
contains 'ç' and is promoted to integer, it becomes a negative integer, whereas 'ç' is a positive integer. Hence the warning from the compiler.
For an 8-bit character set, by far the fastest way to do such an operation is to create a table of 256 bytes, where each position contains the unaccented version of the character.
int unaccented(int c)
{
static const char map[256] =
{
'\x00', '\x01', ...
...
'0', '1', '2', ...
...
'A', 'B', 'C', ...
...
'a', 'b', 'c', ...
...
'A', 'A', 'A', ... // 0xC0 onwards...
...
'a', 'a', 'a', ... // 0xE0 onwards...
...
};
if (c < 0 || c > 255)
return EOF;
else
return map[c];
}
Of course, you'd write a program - probably a script - to generate the table of data, rather than doing it manually. In the range 0..127, the character at position x is the character with code x (so map['A'] == 'A'
).
If you are allowed to exploit C99, you can improve the table by using designated initializers:
static const char map[] =
{
['\x00'] = '\x00', ...
['A'] = 'A', ...
['a'] = 'a', ...
['å'] = 'a', ...
['Å'] = 'A', ...
['ÿ'] = 'y', ...
};
It isn't entirely clear what you should do with diphthongs letters such as 'æ' or 'ß' that have no ASCII equivalent; however, the simple rule of 'when in doubt, do not change it' can be applied sensibly. They aren't accented characters, but neither are they ASCII characters.
This does not work so well for UTF-8. For that, you need more specialized tables driven from data in the Unicode standard.
Also note that you should coerce any 'char' value to 'unsigned char' before calling this. That said, the code could also attempt to deal with abusers. However, it is hard to distinguish 'ÿ' (0xFF) from EOF when people are not careful in calling the function. The C standard character test macros are required to support all valid character values (when converted to unsigned char) and EOF as inputs - this follows that design.
§7.4/1
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.