views:

200

answers:

5

I have a query based on the below program -

char ch;
ch = 'z';
while(ch >= 'a')
{
    printf("char is  %c and the value is %d\n", ch, ch);
    ch = ch-1;
}

Why is the printing of whole set of lowercase letters not guaranteed in the above program. If C doesn't make many guarantees about the ordering of characters in internal form, then who actually does it and how ?

+1  A: 

Obviously determined by the implementation of C you're using, but more then likely for you it's determined by the American Standard Code for Information Interchange (ASCII).

Dominic Bou-Samra
But, then how can that program can cause portability problems ?
S.Man
Because the platform might use EBCDIC instead of ASCII. Or because the language uses a different alphabet.
dan04
+15  A: 

The compiler implementor chooses their underlying character set. About the only thing the standard has to say is that a certain minimal number of characters must be available and that the numeric characters are contiguous.

The required characters for a C99 execution environment are A through Z, a through z, 0 through 9 (which must be together and in order), any of !"#%&'()*+,-./:;<=>?[\]^_{|}~, space, horizontal tab, vertical tab, form-feed, alert, backspace, carriage return and new line. This remains unchanged in the current draft of C1x, the next iteration of that standard.

Everything else depends on the implementation.

For example, code like:

int isUpperAlpha(char c) {
    return (c >= 'A') && (c <= 'Z');
}

will break on the mainframe which uses EBCDIC, dividing the upper case characters into two regions.

Truly portable code will take that into account. All other code should document its dependencies.

A more portable implementation of your example would be something along the lines of:

static char chrs[] = "zyxwvutsrqponmlkjihgfedcba";
char *pCh = chrs;
while (*pCh != 0) {
    printf ("char is %c and the value is %d\n", *pCh, *pCh);
    pCh++;
}

If you want a real portable solution, you should probably use islower() since code that checks only the Latin characters won't be portable to (for example) Greek using Unicode for its underlying character set.

paxdiablo
It should be stated, that although the C standard does not dictate ASCII character set, and strictly speaking, code that assumes this is not portable, that in the real world where EBCDIC and IBM midrange/mainframes (and ONLY the IBM crap, IIRC) are extremely and utterly irrelevant.
Warren P
Interesting thought, @Warren, remember that the next time you use a bank.
paxdiablo
Which bank processes transactions with C code on IBM midrange or mainframes, paxdiablo?
Warren P
Then I misunderstood your comment, Warren, I thought you were stating that the mainframes themselves were irrelevant. In any case, neither of us knows whether or not the banks use C on their big iron (I doubt it, CICS and DB2 are far more likely at least for the TP and customer-facing stuff). Despite that, I know for a fact that there _is_ plenty of C code written for the mainframes and AS400s. If you want to limit where your software can be used, that's your right and you may be happy with only being able to target 99.5% of the industry :-)
paxdiablo
+1  A: 

It is determined by whatever the execution character set is.

In most cases nowadays, that is the ASCII character set, but C has no requirement that a specific character set be used.

Note that there are some guarantees about the ordering of characters in the execution character set. For example, the digits '0' through '9' are guaranteed each to have a value one greater than the value of the previous digit.

James McNellis
+4  A: 

Why is the printing of whole set of lowercase letters not guaranteed in the above program.

Because it's possible to use C with an EBCDIC character encoding, in which the letters aren't consecutive.

dan04
Okay. Thx for the clarification +1. So, this EBCDIC usage on some systems and ASCII on some systems can cause portability problems. But, who decides on the selection of EBCDIC or ASCII and when is it done ?
S.Man
It's decided by the operating system. EBCDIC is used only by IBM mainframes; everyone else uses variants of ASCII. I said "variants of" ASCII because ASCII only encodes 128 characters for American English, and other languages need encodings that have accented letters or a non-Latin script. There have been literally hundreds of locale- and platform-specific character encodings developed, but there's been a recent trend towards using UTF-8.
dan04
@dan04, the AS400 also uses EBCDIC and, contrary to what the AS400 fanboys will tell you, it is _not_ a mainframe :-)
paxdiablo
+1  A: 

These days, people going around calling your code non-portable are engaging in useless pedantry. Support for ASCII-incompatible encodings only remains in the C standard because of legacy EBCDIC mainframes that refuse to die. You will never encounter an ASCII-incompatible char encoding on any modern computer, now or in the future. Give it a few decades, and you'll never encounter anything but UTF-8.

To answer your question about who decides the character encoding: While it's nominally at the discression of your implementation (the C compiler, library, and OS) it was ultimately decided by the internet, both existing practice and IETF standards. Presumably modern systems are intended to communicate and interoperate with one another, and it would be a huge headache to have to convert every protocol header, html file, javascript source, username, etc. back and forth between ASCII-compatible encodings and EBCDIC or some other local mess.

In recent times, it's become clear that a universal encoding not just for machine-parsed text but also for natural-language text is also highly desirable. (Natural language text interchange is not as fundamental as machine-parsed text, but still very common and important.) Unicode provided the character set, and as the only ASCII-compatible Unicode encoding, UTF-8 is pretty much the successor to ASCII as the universal character encoding.

R..
I'm sorry, R, but that statement about "never encounter an ASCII-incompatible char encoding on any modern computer" is complete rubbish. Do you really think that today's mainframe is unchanged from the System/360? UNIX System Services under z/OS uses EBCDIC and, if you check out the latest z10 EC machines, they'll blow anything else out of the water in terms of raw throughput (not just CPU). The rest I agree with.
paxdiablo
I should have been more careful to define what I meant by modern. While that is modern hardware, the software it's running is stuck halfway in the dark ages.
R..
@R: Define "dark ages" in computing chronology. Tell us how you know that the software that runs on modern mainframes is from the dark ages.
JeremyP
My answer and comment were not intended to provoke argument but it seems they have, for whatever reason. I'll just leave it at that since the point was to answer the question and not engage in debates over what "modern" means.
R..
z/OS may be a fine system otherwise, but EBCDIC is definitely from the dark ages. It was, after all, designed for compatibility with a punched card encoding, and punched cards went out of style before I was born!
dan04
Thus "halfway".
R..
z/OS and OS/400 are [semi-]dinosaurs.
Warren P
Warren, spoken like a true PC developer :-)
paxdiablo
No, a PC developer would say "that's how it is on my machine so it must be like that everywhere". A developer who thinks about standards and interoperability (real world standards, beyond the basic minimum C requires) would say "that's how it is on the internet and on the vast majority of POSIX systems so that's how it **should be** everywhere." :-)
R..