tags:

views:

335

answers:

6

Is EOF always negative?

I'm thinking of writing a function that reads the next word in the input and returns the line number the word was found in or EOF if the end of the input has been reached. If EOF is not necessarily negative, the function would be incorrect.

+14  A: 

EOF is always == EOF. Don't assume anything else.

On a second reading of the standard (and as per some other comments here) it seems EOF is always negative - and for the use specified in this question (line number or EOF) it would work. What I meant to warn against (and still do) is assuming characters are positive and EOF is negative.

Remember that it's possible for a standard conforming C implementation to have negative character values - this is even mentioned in 'The C programming language' (K&R). Printing characters are always positive, but on some architectures (probably all ancient), control characters are negative. The C standard does not specify wether the char type is signed or unsigned, and the only character constant guaranteed to have the same value across platforms, is '\0'.

gnud
Could you add some references ?
Guillaume
This isn't correct. The EOF macro **must** expand to a negative integer, unfortunately I don't have my copy of the standard to hand at the moment.
Charles Bailey
The C Library Reference, which says that it takes nearly all of its information from the ANSI C Standard, says that "EOF is a negative integer which indicates an end-of-file has been reached" (http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.12.html#variables). That said, I would still say that it's bad style and, on top of that, unnecessary to assume EOF is negative.
Martin B
Control characters are not usually negative. If your plain char type is signed and the code set is EBCDIC, then the value obtained by promoting the char value of 'A' (0xC1, 191) direct to int is negative; if you promote to unsigned char and then int, it will be positive, of course.
Jonathan Leffler
all members of the basic execution character set, however, are positive.
Johannes Schaub - litb
+8  A: 

Have that function return

  • the line number the word was found in
  • or -1 in case the end of the input has been reached

Problem solved, without a need for relying on any EOF values. The caller can easily test for greater-or-equal-to-zero for a successful call, and assume EOF/IO-error otherwise.

ndim
Indeed, it's a good alternative.
Ree
+1  A: 

EOF is a condition, rather than a value. The exact value of this sentinel is implementation defined. In a lot of cases, it is a negative number.

dirkgently
+1  A: 

From wikipedia :

The actual value of EOF is a system-dependent negative number, commonly -1, which is guaranteed to be unequal to any valid character code.

But no references ...

From Secure Coding : Detect and handle input and output errors EOF is negative but only when sizeof(int) > sizeof(char).

Guillaume
Wikipedia isn't completely correct here. EOF could be -1, and with signed chars a character can have value -1 (e.g. The euro symbol in windows-1252). What is the case is that the return value of `(f)getc` is the next character cast to an unsigned char and then to an `int` and this shouldn't match `EOF`. Of course this can only work if `sizeof(int) != sizeof(char)`.
Charles Bailey
+8  A: 

Yes, EOF is always negative.

The Standard says:

7.19 Input/output
7.19.1 Introduction

3 The macros are [...] EOF which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

Note that there's no problem with "plain" char being signed. The <stdio.h> functions which deal with chars, specifically cast the characters to unsigned char and then to int, so that all valid characters have a positive value. For example:

int fgetc(FILE *stream)

7.19.7.1
... the fgetc function obtains that character as an unsigned char converted to an int ...

pmg
There is a problem if `sizeof(char) == sizeof(int)`, though as even the cast via `unsigned char` won't then necessarily keep all valid `char` values positive. Fortunately this is relatively rare.
Charles Bailey
That's only a problem if you wrongly assume a negative value is always EOF (`if (ch < 0) /* EOF detected */;`) or if the "execution character set" uses up all values from `INT_MIN` to `0` in which case the `EOF` value is the same as the value of a valid character.
pmg
A: 

From the online draft n1256, 17.9.1.3:

EOF

which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

EOF is always negative, though it may not always be -1.

For issues like this, I prefer separating error conditions from data by returning an error code (SUCCESS, END_OF_FILE, READ_ERROR, etc.) as the function's return value, and then writing the data of interest to separate parameters, such as

int getNextWord (FILE *stream, char *buffer, size_t bufferSize, int *lineNumber)
{
  if (!fgets(buffer, bufferSize, stream))
  {
    if (feof(stream)) return END_OF_FILE; else return READ_ERROR;
  }
  else
  {
    // figure out the line number
    *lineNumber = ...;
  }
  return SUCCESS;
}
John Bode