views:

114

answers:

5

I am currently working on a C project that needs to be fairly portable among different building environments. The project targets POSIX-compliant systems on a hosted C environment.

One way to achieve a good degree of portability is to code under conformance to a chosen standard, but it is difficult to determine whether a given translation unit is strict-conformant to ISO C. For example, it might violate some translation limits, or it might be relying on an undefined behavior, without any diagnostic message from the compilation environment. I am not even sure whether it is possible to check for strict conformance of large projects.

With that in mind, is there any compiler, tool or method to test for strict ISO C conformance under a given standard (for example, C89 or C99) of a translation unit?

Any help is appreciated.

+3  A: 

Not really. The C standard doesn't set any absolute minimum limits on translation units that must be accepted. As such, a perfectly accurate checker would be trivial to write, but utterly useless in practice:

#include <stdio.h>

int main(int argc, char **argv) { 
    int i;
    for (i=1; i<argc; i++)
        fprintf(stderr, "`%s`: Translation limit (potentially) exceeded.\n", argv[i]);
    return 0;
}

Yes, this rejects everything, no matter how trivial. That is in accordance with the standard. As I said, it's utterly useless in practice. Unfortunately, you can't really do a whole lot better -- when you decide to port to a different implementation, you could run into some oddball resource limit you've never seen before, so any code you write (up to an including "hello world") could potentially exceed a resource limit despite being allowed by dozens or even hundreds of compilers on/for much smaller systems.

Edit:

Why a "hello world" program isn't strictly conforming

First, it's worth re-stating the definition of "strictly conforming": "A strictly conforming program shall use only those features of the language and library specified in this International Standard.2) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit."

There are actually a number of reasons "Hello, World" isn't strictly conforming. First, as implied above, the minimum requirements for implementation limits are completely meaningless -- although there has to be some program that meets certain limits that will be accepted, no other program has to be accepted, even if it doesn't even come close to any of those limits. Given the way the requirement is stated, it's open to question (at best) whether there is any such thing as a program that doesn't exceed any minimum implementation limit, because the standard doesn't really define any minimum implementation limits.

Second, during phase 1 of translation: "Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set ... " (§5.1.1.2/1). Since "Hello, World!" (or whatever variant you prefer) is supplied as a string literal in the source file, it can be (is) mapped in an implementation-defined manner to the source character set. An implementation is free to decide that (for an idiotic example) string literals will be ROT13 encoded, and as long as that fact is properly documented, it's perfectly legitimate.

Third, the output is normally written via stdout. stdout is a text stream. According to the standard: "Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation." (§7.19.2/2) As such, an implementation could (for example) do Huffman compression on the output (on Monday, Wednesday, or Friday).

So, we have (at least) three distinct points at which the output from a "Hello, World!" depends on implementation-defined characteristics -- any one of which would prevent it from fitting the definition of a strictly conforming program.

Jerry Coffin
This is not in accordance with the standard. See §5.2.4.1 Translation Limits.
Stephen Canon
@Stephen: yes, it is. The requirement is: "The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:". Only *one* specific program -- and there doesn't even seem to be a requirement to document what that program is. Every possible input can fail except for one specific one that doesn't need to be identified...
Jerry Coffin
@Jerry: from the ISO C99 Standard, §4 Conformance: "A strictly conforming program shall use only those features of the language and library specified in this International Standard. It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit." Why would, therefore, an obviously simple implementation of the Hello World program be not strictly conformant in this case?
Alek
You nit about multibyte characters is completely wrong, but the rest is mostly right.
R..
@Jerry: Back when we had C90, Peter Seebach claimed that he'd written a conforming compiler, which read the file into /dev/null, printed "Warning: Wonky compiler!" (a diagnostic), and printed 0. Clearly, it correctly executes any program whose output is "0", therefore correctly executing lots of possible programs, and it issues a diagnostic whenever a diagnostic was required (as well as when it wasn't, which is legal).
David Thornley
A: 

gcc has warning levels that will attempt to pin down various aspects of ANSI conformance. But hat's only a starting point.

Darron
+3  A: 

It is not possible in general to find undefined run-time behavior. For example, consider

void foo(int *p, int *q)
{
    *p = (*q)++;
    ...

which is undefined if p == q. Whether that can happen can't be determined ahead of time without solving the halting problem.

(Edited to fix mistake caf pointed out. Thanks, caf.)

David Thornley
Your example is fantastic. I've never thought about such simple pointer expressions. I believed that every such undefined condition could be spotted at parse-time.
Alek
Your example is actually fine, even if `p == q` - perhaps you meant `*p = (*q)++`, which is undefined if `p == q`?
caf
A: 

You might start with gcc -std=c99, or gcc -ansi -pedantic.

bta
A: 

Good luck with that. Try to avoid signed integers, because:

int f(int x) 
{
 return -x;
}

can invoke UB.

Luther Blissett