I have just started reading through The C Programming Language and I am having trouble understanding one part. Here is an excerpt from page 24:
#include<stdio.h> /*countdigits,whitespace,others*/ main() { intc,i,nwhite,nother; intndigit[10]; nwhite=nother=0; for(i=0;i<10;++i) ndigit[i]=0; while((c=getchar())!=EOF) if(c>='0'&&c<='9') ++ndigit[c-'0']; //THIS IS THE LINE I AM WONDERING ABOUT else if(c==''||c=='\n'||c=='\t') ++nwhite; else ++nother; printf("digits="); for(i=0;i<10;++i) printf("%d",ndigit[i]); printf(",whitespace=%d,other=%d\n", nwhite,nother); }
The output of this program run on itself is
digits=9300000001,whitespace=123,other=345
The declaration
intndigit[10];
declares ndigit to be an array of 10 integers. Array subscripts always start at zero in C, so the elements are
ndigit[0], ndigit[ 1], ..., ndigit[9]
This is reflected in the for loops that initialize and print the array. A subscript can be any integer expression, which includes integer variables like i,and integer constants. This particular program relies on the properties of the character representation of the digits. For example, the test
if(c>='0'&&c<='9')
determines whether the character in c is a digit. If it is, the numeric value of that digit is
c-'0'`
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets. By definition, chars are just small integers, so char variables and constants are identical to ints in arithmetic expressions. This is natural and convenient; for example
c-'0'
is an integer expression with a value between 0 and 9 corresponding to the character '0' to '9' stored in c, and thus a valid subscript for the array ndigit.
The part I am having trouble understanding is why the -'0'
part is necessary in the expression c-'0'
. If a character is a small integer as the author says, and the digit characters correspond to their numeric values, then what is -'0'
doing?