tags:

views:

1075

answers:

7

OK, this is of no serious consequence, but it's been bugging me for a while: Is there a reason for the distinction between the -> and . operators?

Of course, the current rule is that . acts on a struct, and -> acts on a pointer-to-struct (or union). But here's how it works in practice. Let s be a struct incuding an element x, and let ps be a pointer to a struct of the same form.

If you write

s->x

the compiler will spit out a warning in the way of

You meant s.x. Please retype that and recompile.

If you write

ps.x

the compiler will spit out a warning in the way of

You meant ps->x. Please retype that and recompile.

Because the compiler knows the type of both s and ps at compile time, it has all the information it needs to interpret what the correct operator would be. I suspect that this isn't like other warnings (like a missing semicolon), in that there is no ambiguity about the correct fix.

So here's a hypothetical proposal to the C1x standards committee (that would never be considered, because the ISO is on a conservative streak):

Given the expression lhs.rhs, if lhs is a struct or union type, then the expression shall refer to the element of lhs named rhs. If lhs is of type pointer-to-struct or -union, then this shall be interpreted as (*lhs).rhs.

This would certainly save us all time, and make it easier for people to learn C [and I've taught enough C to say with authority that learners find the -> thing to be either confusing or annoying.]

There's even precedent, where C does a handful of similar things. E.g., for implementation reasons, function declarations are always cast to pointer-to-function, so f(x,y) and (*f)(x,y) will both work regardless of whether f was declared as a function or a pointer to function.

So, my question: what's wrong with this proposal? Can you think of examples where there would be fatal ambiguity between ps.x and s.x, or why keeping the mandatory distinction is otherwise useful?

+25  A: 

Well there clearly isn't any ambiguity or the proposal couldn't be made. The only issue is that if you see:

p->x = 3;

you know p is a pointer but if you allow:

p.x = 3;

in that circumstance then you don't actually know, which could potentially create problems, particularly if you later cast that pointer and use the wrong number of levels of indirection.

cletus
Indeed. It makes code easier to read/understand if you can immediately recognize a variable as a pointer based on the indirection operator.
Mike Weller
+2  A: 

If anything, the current syntax lets readers of the code know whether or not the code is working with a pointer or the actual object. Someone who does not know the code beforehand understands it better.

Noctis Skytower
+27  A: 

I don't think there's anything crazy about what you've said. Using . for pointers to structs would work.

However, I like the fact that pointers to structs and structs are treated differently.

It gives some context about operations and clues as to what might be expensive.

Consider this snippet, imagine that it's in the middle of a reasonably large function.

s.c = 99;
f(s);

assert(s.c == 99);

Currently I can tell that s is a struct. I know that it's going to be copied in its entirety for the call to f. I also know that that assert can't fire.

If using . with pointers to struct were allowed, I wouldn't know any of that and the assert might fire, f might set s.c (err s->c) to something else.

The other downside is that it would reduce compatibility with C++. C++ allows -> to be overloaded by classes so that classes can be 'like' pointers. It's important that . and -> behave differently. "New" C code that used . with pointers to structs would no probably not be acceptable as C++ code any more.

Charles Bailey
+1 for C++ problems.
Douglas Leeder
Why is the difference between "->" and "." important in C++? Couldn't you just overload operator "." in C++ instead of ->?
Edan Maor
Currently you can't overload the `.` operator, but that's not important right now. What can be important is that a class can act like a pointer because you can intercept the `->` operator to return a pointer to the object being 'proxied', but then you can also call things on the object itself (e.g. `.reset()` to set it to null or something). Losing the disctintion between `.` and `->` would prevent this from working.
Charles Bailey
With this example, you don't actually know that the assertion will hold. The struct may contain a pointer to itself, giving f the ability to modify s.c
William Pursell
@William Pursell: Touche, I knew when i wrote this that there was probably a corner case where it wouldn't hold. I was actually wondering if there was a type that you could assign an int (99) to and which then wouldn't compare equal to 99 when promoted back to an int, but a self referential (or global/static instance!) would also work.
Charles Bailey
+5  A: 

Well, there could definitely be cases where you have something complex like:

(*item)->elem

(which I have had happen in some programs), and if you wrote something like

item.elem

meaning the above, it could be confusing whether elem is an element of struct item, or an element of a struct that item points to, or an element of a struct that is pointed to be an element in a list that is pointed to by an iterator item, and so on and so forth.

So yeah, it does make things somewhat clearer when using pointers to pointers to structs, &c.

Keand64
+3  A: 

Well, if you really wanted to introduce that kind of functionality into the specification of C language, then in order to make it "blend" with the rest of the language the logical thing to do would be to extend the concept of "decay to pointer" to struct types. You yourself made an example with a function and a function pointer. The reason it works that way is because function type in C decays to pointer type in all contexts, except for sizeof and unary & operators. (The same thing happens to arrays, BTW.)

So, in order to implement something similar to what you suggest, we could introduce the concept of "struct-to-pointer decay", which would work in exactly the same way as all other "decays" in C (namely, array-to-pointer decay and function-to-pointer decay) work: when a struct object of type T is used in an expression, its type immediately decays to type T* - pointer to the beginning of the struct object - except when it's an operand of sizeof or unary &. Once such a decay rule is introduced for structs, you could use -> operator to access struct elements regardless of whether you have a pointer to struct or the struct itself on the left-hand side. Operator . would become completely unnecessary in this case (unless I'm missing something), you'd always use -> and only ->.

The above, once again, what this feature would look like, in my opinion, if it was implemented in the spirit of C language.

But I'd say (agreeing with what Charles said) that the loss of visual distinction between the code that works with pointers to structs and the code that works with structs themselves is not exactly desirable.

P.S. An obvious negative consequence of such a decay rule for structs would be that besides the current army of newbies selflessly believing that "arrays are just constant pointers", we'd have an army of newbies selflessly believing that "struct objects are just constant pointers". And Chris Torek's array FAQ would have to be about 1.5-2x larger to cover structs as well :)

AndreyT
If it really was redundant, then . could be maintained as a synonym for ->. People who wanted to maintain the distinction could use -> with pointers and . with structs, in the same way that some C++ programmers (try to) declare POD classes with `struct` and non-POD classes with `class`. Then the compiler wouldn't help, so people would make mistakes, and ask for a compiler option to enforce the difference, and be back where they started ;-)
Steve Jessop
+4  A: 

Yes, that's OK, but it is not what C really needs at all

Not only is it OK, but it is the modern style. Java and Go both just use .. Since everything that doesn't fit in a register is at some level a reference, the distinction between thing and pointer to thing is definitely a bit arbitrary, at least until you get to function calls.

The first evolutionary step was to make the dereference operator postfix, something dmr once implied he at some point prefered. Pascal does this, so it has p^.field. The only reason there even is a -> operator is because it's goofy to have to type (*p).field or p[0].field.

So yes, it would work. It would even be better as it works at a higher level of abstraction. One really should be able to make as many changes as possible without requiring downstream code to change, that's in a sense the entire point of higher level languages.

I have argued that using () for function calls and [] for array subscripting is wrong. Why not allow different implementations to export different abstractions?

But there isn't much reason to make the change. C programmers are unlikely to revolt over the lack of a syntactic sugar extension that saves one character in an expression and it would be hard to use anyway because it would not be immediately if ever universally adopted. Remember that when standards committees go rogue they end up preaching to empty rooms. They require the willing cooperation and agreement of the world's compiler developers.

What C really needs isn't ever-so-slightly faster ways to write unsafe code. I don't mind working in C, but project managers don't like having their reliability determined by their worst guy, and it's possible that what C really needs is a safe dialect, something like Cyclone, or perhaps something just like Go.

DigitalRoss
Have you *used* Cyclone? It's great research, but the type system is from hell. And it was always damned difficult to keep things in statically typed regions as opposed to letting everything work its way up to the garbage-collected heap. It's great work, but let's not oversell it, shall we?
Norman Ramsey
Hmm, not like you have, apparently. I will revise...
DigitalRoss
+1, really good summary.
Konrad Rudolph
+15  A: 

A distinguishing feature of the C programming language (as opposed to its relative C++) is that the cost model is very explicit. The dot is distinguished from the arrow because the arrow requires an additional memory reference, and C is very careful to make the number of memory references evident from the source code.

Norman Ramsey
Good point. And that memory reference may be very expensive on modern architectures, perhaps costing 1000x as much as accessing a register, assuming that the data needs to be fetched from main memory.
emk
@Norman Ramsey: The implicit memory refernce suggested by the OP has the same nature as the potential implicit memory reference in `[]` and `()` operators, as I noted in my answer. When you use `[]` operator, you can't see from the syntax wheteher you are working with a "real" array or with a pointer object (the latter requiring an extra memory reference). So no, this part of cost model is not normally explicit in C. And what he OP is proposing does not cross the traditional boundaries of the "implicitness" of C cost model at all.
AndreyT
@AndreyT: There's nothing implicit about `[]`; `a[i]` is always and forever syntactic sugar for `*(a+i)`, just as `p->x` is syntactic sugar for `(*p).x`. I used to love to blow people's minds writing `i[a] = a[i] + k` and similar wackiness.
Norman Ramsey
@Norman : Yes, there is. You are missing the fact that there's a significant difference between 'a' as a name of array object and 'a' as a pointer object. The expression `*(a + i)` looks the same in both cases, but in fact it's actual semantics is considerably different. In case of an array object, the act of converting array type to ponter type is purely conceptual, meaning that the resutant pointer is essentially a compile-time constant (or a compile-time offset in case of automatic array). There's no memory access in the process of obtaining the pointer value.
AndreyT
But in case when `a` is a pointer, retrieving the actual value of `a` requires a memory access. So, you are wrong, there's an inherent implicitness in `[]`. Stating that `[]` is just a syntactic sugar for `*(a + i)` doesn change anything - the same implicitness is still present in `*(a + i)` as well. Moreover, the nature of that implicitness is *exactly* the same as in what the OP proposes. I actually illustrate it in a very strighforward way in my answer.
AndreyT
@AndreyT: I didn't say there was no difference between a pointer and an array. Remind me how many compilers you've written again?
Norman Ramsey
Huh? What does this have to do with the number of compilers I have written? You made a claim that C has explicit cost model. I have demonstarted that C cost model is not absolutely explicit and, moreover, that exactly the same "implicitness" as the one proposed by the OP is *already present* in the language. Now you seem to be trying to switch subject. Why? And what's the point? This is a very minor issue, but you are acting as if my being right is somehow insulting to you. Sheesh... :)
AndreyT