tags:

views:

308

answers:

6

It has always struck me as strange that the C function "fopen" takes a "const char *" as the second argument. I would think it would be easier to both read your code and implement the library's code if there were bit masks defined in stdio.h, like "IO_READ" and such, so you could do things like:

FILE* myFile = fopen("file.txt", IO_READ | IO_WRITE);

Is there a programmatic reason for the way it actually is, or is it just historic? (i.e. "That's just the way it is.")

EDIT Thanks for the explanations everyone. The correct answer is probably amongst those given.

+4  A: 

One word : legacy. Unfortunately we have to live with it.

Just speculation : Maybe at the time a "const char *" seemed more flexible solution, because it is not limited in any way. A bit mask could only have 32 different values. Looks like a YAGNI to me now.

More speculation : Dudes were lazy and writing "rb" requires less typing than MASK_THIS | MASK_THAT :)

Tuomas Pelkonen
Why though? Masks seem the natural choice here, especially in the day `fopen` was designed.
GMan
@GMan: As I speculated in my answer, I'm wondering if programmers' time was seen as the more valuable resource. But I have no real proof.
Platinum Azure
@GMan: I'm not sure masks are necessarily a *more* natural choice than a string descriptor. Either are reasonable, defensible design options.
Michael Burr
"unfortunately"? how often does it really annoy you?
Gregory Pakosz
I'm going to check this answer, simply because it seems like the most likely explanation. I like Michael Burr's answer as well, but as he said, this is much less likely to have crossed Mike Lesk's mind I imagine. In retrospect, although this question does probably have one definitive answer, I can't really verify anything to make a correct selection! My mistake.
Chris Cooper
"A bit mask could only have 32 different values" -- When C was invented, a bit mask could only have 16 different values.
Windows programmer
@Windows programmer: No, C does not have a bit mask type. Any integral type will work (preferably unsigned), and `unsigned long` is a minimum of 32 bits.
David Thornley
When C was invented, the second parameter to open() had type int, int had 16 bits, unsigned didn't exist, and long didn't exist. The second parameter to open() was an int that was used as a bit mask. Since the int was being used as a bit mask, 16 bits could represent 16 different values.
Windows programmer
A: 

As Tuomas Pelkonen says, it's legacy.

Personally, I wonder if some misguided saps conceived of it as being better due to fewer characters typed? In the olden days programmers' time was valued more highly than it is today, since it was less accessible and compilers weren't as great and all that.

This is just speculation, but I can see why some people would favor saving a few characters here and there (note the lack of verbosity in any of the standard library function names... I present string.h's "strstr" and "strchr" as probably the best examples of unnecessary brevity).

Platinum Azure
The lack of verbosity in the library names is because they wanted to support systems that only supported 6 significant characters in external names. Remember C was defined, long, long ago, and not all systems had great tool support.
Michael Burr
Good point, though that doesn't explain fprintf and sprintf. I guess those could have been defined later, though, and I'll admit I'm too lazy to look up the history at the moment.
Platinum Azure
You don't remember typing code on teletypes!
Martin Beckett
@Platinum: `fprintf` and `sprintf` becaus they had to be distinct in the 1st 6 characters. Symbols were allowed to be longer than 6 characters, but had to be distinct if the characters after the 6th were dropped. So, I suppose they could have made them more human readable as long as they started with 6 characters of potential gobbledygook.
Michael Burr
@Martin, I could type as fast (or faster) on a teletype than I do today. Computers have gotten faster, I've gotten slower.
Mark Ransom
@Michael Burr: The general rule was that names with external linkage had to be unique within the first six characters, not counting case differences. This was continued in C89, although the Rationale describes the decision as "most painful".
David Thornley
+1  A: 

I must say that I am grateful for it - I know to type "r" instead of IO_OPEN_FLAG_R or was it IOFLAG_R or SYSFLAGS_OPEN_RMODE or whatever

pm100
This is actually a really good point, it is a lot easier to remember the API.
Tuomas Pelkonen
I must say I disagree with you. I should be able to tell what a function call does by looking at it's arguments. You can't do that with just an "r". What does "r" mean? You can't tell without reading docs on the function in question.
Billy ONeal
@Tuomas Pelkonen: It's easier to remember to write, but it's a pain in the butt later when you read. Since code is read much more often than it is written, I'd optimize for the read case rather than the write case.
Billy ONeal
I agree that code is read more often than it is written (I actually wrote a blog about this), but to me the readability is pretty good when there is an "r", but I have programmed with C for a long time...
Tuomas Pelkonen
@BillyONeil: Seems to me to 6 of one half-dozen of the other - I'd need to either know to use "r" or whichever enum corresponds to 'read-only'. I'd say "r" is somewhat easier to remember over some arcane enum name (which is pm100's point). But either way, you need to know the correct thing to pass. If the API were specified to take an enum there would be nothing to protect you from incorrectly passing an arbitrary and potentially meaningless or incorrect integer.
Michael Burr
"What does "r" mean". What do you mean, what does "r" mean? It doesn't mean "rhubarb" or "reverse", or "randomly", I can tell you that. What does the "f" in `fopen` mean? Is there any serious danger that someone familiar with C could look at a call to fopen and think, "I wonder if this is opening a football in rabid-wombat mode"? Names of standard library functions and arguments can be terser than third-party libraries or code used only for one module, because programmers make an effort to learn standard libraries, they don't just trip over it one day unexpectedly.
Steve Jessop
You know to type "r", but if you mistype it and your code says "f" instead, your code will still compile even though it's wrong. If you use named constants instead, and you mistype IO_READ as IO_FEAD, you get a compile error to alert you to the problem.
Wyzard
@Wyzard: That's a good point.[And plus, with flags, it would be so much easier to perpetuate the "cryptic-C programmer stereotype" with calls like fopen("x", 4); ;D]
Chris Cooper
+3  A: 

I'd speculate that it's one or more of the following (unfortunately, I was unable to quickly find any kind of supporting references, so this'll probably remain speculation):

  1. Kernighan or Ritchie (or whoever came up with the interface for fopen()) just happened to like the idea of specifying the mode using a string instead of a bitmap
  2. They may have wanted the interface to be similar to yet noticeably different from the Unix open() system call interface, so it would be at once familiar yet not mistakenly compile with constants defined for Unix instead of by the C library

For example, let's say that the mythical C standard fopen() that took a bitmapped mode parameter used the identifier OPENMODE_READONLY to specify that the file what today is specified by the mode string "r". Now, if someone made the following call on a program compiled on a Unix platform (and that the header that defines O_RDONLY has been included):

fopen( "myfile", O_RDONLY);

There would be no compiler error, but unless OPENMODE_READONLY and O_RDONLY were defined to be the same bit you'd get unexpected behavior. Of course it would make sense for the C standard names to be defined the same as the Unix names, but maybe they wanted to preclude requiring this kind of coupling.

Then again, this might not have crossed their minds at all...

Michael Burr
+1  A: 

Dennis Ritchie has this to say, from http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

In particular, Lesk wrote a 'portable I/O package' [Lesk 72] that was later reworked to become the C `standard I/O' routines

So I say ask Mike Lesk, post the result here as an answer to your own question, and earn stacks of points for it. Although you might want to make the question sound a bit less like criticism ;-)

Steve Jessop
There was no such datatype as const char * when Lesk wrote that package. No wait. Ambiguously the datatype might or might not have existed depending on how the implementation handled string constants, but there was no way for a C programmer to specify that datatype in a program.
Windows programmer
True, but then `const` didn't exist when `strlen` was invented, either, but I don't think we can conclude from this, that `strlen` probably originally took any parameter other than a string pointer ;-) It's just that the typical way of saying "a string" changed.
Steve Jessop
+1  A: 

I believe that one of the advantages of the character string instead of a simple bit-mask is that it allows for platform-specific extensions which are not bit-settings. Purely hypothetically:

FILE *fp = fopen("/dev/something-weird", "r+:bs=4096");

For this gizmo, the open() call needs to be told the block size, and different calls can use radically different sizes, etc. Granted, I/O has been organized pretty well now (such was not the case originally - devices were enormously diverse and the access mechanisms far from unified), so it seldom seems to be necessary. But the string-valued option string allows for that extensibility far better.

The underlying open() has to be augmented by the ioctl() (I/O control) call or functions hiding it to achieve similar effects.

Jonathan Leffler
Thanks Jonathan. That's another interesting advantage of strings I was not aware of.
Chris Cooper