ansaurus

Question

Answer 1

+5 A:

There is a much better way.

In ASCII (www.asciitable.com) you can know the numerical values of these characters.

'A' is 0x41.

So you can simply minus 0x41 from them, to get the numbers. I don't know c very well, but something like:

int num = 'A' - 0x41;

should work.

Noon Silk 2009-09-24 04:07:03

'A' is 65, or 0x41.

James Black 2009-09-24 04:10:03

Not sure if you posted that before you saw my edit; I've corrected myself. I always remember it as `41`, I guess I think in hex :P

Noon Silk 2009-09-24 04:11:33

also: `int num = letter - 'A';`

Nick D 2009-09-24 04:11:35

I was about to answer similarly but I think there is no need to. Just want to add that you can do this: int '`num = aChar - 'A'`';

NawaMan 2009-09-24 04:11:58

It's more common to use `int num = 'A' - 'A'` (replacing the first one with the character or variable in question) _just in case_ we're not using ASCII, though I think that might be guaranteed by the standard. I know that the standard guarantees that 'A' .. 'Z' are consecutive in the character set, though.

Chris Lutz 2009-09-24 04:13:07

I prefer to use 'A' as it improves readability, otherwise someone has to look up 0x41 and see what it is. :)

James Black 2009-09-24 04:16:58

@Chris: it doesn't matter if you know you are using ASCII or not. Putting magic numbers (i.e. literal constants) in code is a bad practice and should almost always be avoided.

Jeanne Pindar 2009-09-24 05:08:16

Jeanne: This is homework, not a real application. Context matters.

Noon Silk 2009-09-24 05:25:55

Chris: 'A' to 'Z' being consecutive *isn't* guaranteed by the standard (only '0' to '9').

caf 2009-09-24 05:47:43

@caf: It *is* guaranteed by the **ASCII** standard. It's just that the **C** standard doesn't guarantee that C strings will be ASCII!

Daniel Pryden 2009-09-24 05:51:58

@silky - No, it doesn't. Using `0x41` instead of `'A'` is silly. Why don't we all write our numbers and strings directly in binary? Why not calculate our own jumps and pointer arithmetic? @caf - This has been pointed out in various places, but I'm having a rather rough day mentally so it can't hurt to remind me. ;) But yes, I was extrapolating from '0' - '9' being consecutive to 'A' - 'Z', and while not true, it's a fairly safe assumption unless you plan on writing code for mainframes. Doesn't change the fact that `- 'A'` is better than `- 0x41` in almost all situations.

Chris Lutz 2009-09-24 05:54:37

Chris: I disagree with you and will, but will not continue a useless discussion.

Noon Silk 2009-09-24 05:59:23

Chris: Yep I saw that, just thought it should be recorded in this comment thread for posterity ;). Daniel: It's pretty clear that "the standard" in the context I was replying to meant the C standard.

caf 2009-09-24 06:14:06

Answer 2

+8 A:

If you need to deal with upper-case and lower-case then you may want to do something like:

if (letter >= 'A' && letter <= 'Z')
  num = letter - 'A';
else if (letter >= 'a' && letter <= 'z')
  num = letter - 'a';

If you want to display these, then you will want to convert the number into an ascii value by adding a '0' to it:

  asciinumber = num + '0';

James Black 2009-09-24 04:12:48

Alternatively, use `num = toupper(letter) - 'A'` to convert the letter to uppercase, thus avoiding the conditional. The `toupper()` function is found in the `ctype.h` header.

Chris Lutz 2009-09-24 04:14:11

We can also note that lowercase letters are just an `0x20` difference from uppercase.

Noon Silk 2009-09-24 04:14:36

True, but by having a conditional, if you need to differentiate somehow you can, but there are various options, I just wanted to point out that upper and lower-case may be an issue and should be handled.

James Black 2009-09-24 04:16:10

Indeed. I would put this code in a function, and add an `else num = -1` at the end just for safety, but it doesn't matter. You can check that the return value of `toupper() - 'A'` is within the desired range (0 - 25) just as easily.

Chris Lutz 2009-09-24 04:20:23

asciinumber method will only work up to 'J' or 'j'.

Dipstick 2009-09-24 04:24:44

A better way to display the number is just to use `printf()` (or `sprintf()` if you need to work with it as a string).

Chris Lutz 2009-09-24 04:34:14

Note that the "asciinumber = num + '0';" bit only works for single digits.

Tal Pressman 2009-09-24 04:45:52

You are correct, I didn't take into account that asciinumber is flawed.

James Black 2009-09-24 04:49:18

Answer 3

+5 A:

Another, far worse (but still better than 26 if statements) alternative is to use switch/case:

switch(letter)
{
case 'A':
case 'a': // don't use this line if you want only capital letters
    num = 0;
    break;
case 'B':
case 'b': // same as above about 'a'
    num = 1;
    break;
/* and so on and so on */
default:
    fprintf(stderr, "WTF?\n");
}

Consider this only if there is absolutely no relationship between the letter and it's code. Since there is a clear sequential relationship between the letter and the code in your case, using this is rather silly and going to be awful to maintain, but if you had to encode random characters to random values, this would be the way to avoid writing a zillion if()/else if()/else if()/else statements.

Chris Lutz 2009-09-24 04:26:24

This is *not* so silly. Despite your comment elsewhere, @Chris, C99 only mandates that the numeric characters are in order. Alphas can be all over the place (such as EBCDIC with its two different areas). This is, in fact, the only correct answer to date. + 1.

paxdiablo 2009-09-24 05:04:27

Ah. I'm all over the road today. I did know that the digits were in order, I just made a leap about the characters. I really need to read the C standard. I have to say, though, if this is the price of correctness, I'm willing to say "To hell!" with EBCDIC.

Chris Lutz 2009-09-24 05:11:09

It's good for everyone to know that the order is not guaranteed, but seriously, you have to take your audience into consideration. If this program is going to be used by people running on any 'standard' computer it is safe to use "letter - 'A'"

Ed Swangren 2009-09-24 06:25:30

@Ed: *My* audience (visuance?) consists of people who know and follow the standard (and that's standard *without* quotes). Your program wouldn't conform with the standard. That's fine - I understand that the vast majority of C environments use ASCII or ISO646 but I consider it slightly arrogant to state that that's all that matters. ISO left open the possibility for non-contiguous letters for a good reason - do you really think you know better than them? I don't want to get into a p*ssing match, just putting my viewpoint forward - we may just have to agree to disagree.

paxdiablo 2009-09-24 07:43:26

Answer 4

+1 A:

In most programming and scripting languages there is a means to get the "ordinal" value of any character. (Think of it as an offset from the beginning of the character set).

Thus you can usually do something like:

for ch in somestring:
    if lowercase(ch):
        n = ord(ch) - ord ('a')
    elif uppercase(ch):
        n = ord(ch) - ord('A')
    else:
        n = -1  # Sentinel error value
        # (or raise an exception as appropriate to your programming
        #  environment and to the assignment specification)

Of course this wouldn't work for an EBCDIC based system (and might not work for some other exotic character sets). I suppose a reasonable sanity check would be to test of this function returned monotonically increasing values in the range 0..26 for the strings "abc...xzy" and "ABC...XYZ").

A whole different approach would be to create an associative array (dictionary, table, hash) of your letters and their values (one or two simple loops). Then use that. (Most modern programming languages include support for associative arrays.

Naturally I'm not "doing your homework." You'll have to do that for yourself. I'm simply explaining that those are the obvious approaches that would be used by any professional programmer. (Okay, an assembly language hack might also just mask out one bit for each byte, too).

Jim Dennis 2009-09-24 04:36:29

Most of this information doesn't apply to C. Questions have language tags for a reason.

Chris Lutz 2009-09-24 04:51:14

+1 That's pretty good advice for a homework question, explaining _what_ to do without actually doing it - and that in a general algorithm. However, I guess it would be fair to mention how the mapping between characters and numbers in C is done. Otherwise you might send the asker on a wild chase looking for a function to do this in the std lib. (Yes, I know, the others have already blurted this out. But I guess it would still make sense to improve on this answer.)

sbi 2009-09-24 04:53:38

@Lutz: I'd agree with you if this wasn't a homework question. I would want to help a professional with a quick answer straight to the point. But I would want to help students to learn thinking themselves. This is, after all, what homework is supposed to do: Make the students dig and think. If you spell it all out, homework degenerates to a googeling match.

sbi 2009-09-24 04:55:48

I agree that we shouldn't spoon feed anyone homework, but most of the code examples are fairly clear and well explained. You can avoid giving away free code for homework without resorting to pseudocode, and in my opinion should avoid pseudocode if the OP mentions a language. Statements like `for ch in somestring` can confuse someone who doesn't know much about C, and C-style `for()` loops are supported in many languages.

Chris Lutz 2009-09-24 04:59:08

The powers that be should fork off a HomeworkDue site to take on all the homework questions. I've toyed with the idea of proposing a ScienceGeek and BookWorm site as well.

paxdiablo 2009-09-24 05:28:04

@Pax - Do it. It will generate some good discussion on where Jeff and Joel want to go with their engine, if nothing else.

Chris Lutz 2009-09-24 05:49:51

@Pax: "Sorry, that page doesn't exist!"

sbi 2009-09-24 18:08:04

@Chris: I guess we just have to agree to disagree. I find the pseudo code approach very good. If nothing else, it will force students to look up the syntax they have already been told about - and having to look up something you should already know is a damn good way to learn it. Also, for many students not familiar with C-style syntax, C's `for` loops are rather confusion: Only one keyword, but lots of operators and separators, and which of the expressions in between those does which you just have to know.

sbi 2009-09-24 18:12:12

Correct link (hopefully): http://meta.stackoverflow.com/questions/4033/what-stackexchange-sites-do-you-want-to-see/23275#23275

Chris Lutz 2009-09-24 19:00:48

Answer 5

A:

Since the char data type is treated similar to an int data type in C and C++, you could go with some thing like:

char c = 'A';   // just some character

int urValue = c - 65;

If you are worried about case senstivity:

#include <ctype.h> // if using C++ #include <cctype>
int urValue = toupper(c) - 65;

Aashay 2009-09-24 04:40:59

Answer 6

+6 A:

This is a way that I feel is better than the switch method, and yet is standards compliant (does not assume ASCII):

#include <string.h>
#include <ctype.h>

/* returns -1 if c is not an alphabetic character */
int c_to_n(char c)
{
    int n = -1;
    static const char * const alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char *p = strchr(alphabet, toupper((unsigned char)c));

    if (p)
    {
        n = p - alphabet;
    }

    return n;
}

caf 2009-09-24 05:45:29

For full standards compliance you might want to cast `p - alphabet` before assigning it. You might use a `ptrdiff_t` or some other technically correct type, but given the range limitations I don't think it's really necessary. Any integral type is guaranteed to be able to hold any of the values we're using here.

Chris Lutz 2009-09-24 18:58:55

Yes, in this case we can guarantee that `p - alphabet` is in the range 0...25, so it will definitely fit into an `int`. I don't believe a cast there is necessary - the semantics of assigning one integral type to another are quite well defined.

caf 2009-09-24 22:59:47

I'll give you a vote for that one @caf, since it handles all characters sets within the standard. It's also one of the rare times I've seen someone use the const const properly for pointer and pointee :-) Of course, being an old-timer, I would've just done: 'return p ? (int)(p - alphabet) : -1;' instead of all that mucking about with n and if statements.

paxdiablo 2009-11-05 02:41:19

Answer 7

A:

Aww if you had C++

For unicode definition of how to map characters to values

typedef std::map<wchar_t, int> WCharValueMap;
WCharValueMap myConversion = fillMap();

WCharValueMap fillMap() {
  WCharValueMap result;
  result[L'A']=0;
  result[L'Â']=0;
  result[L'B']=1;
  result[L'C']=2;
  return result;
}

usage

int value = myConversion[L'Â'];

Greg Domjan 2009-09-24 06:19:32

ansaurus

tags:

views:

answers:

Converting Letters to Numbers in C

related questions