views:

2144

answers:

18

I was working with a new c++ developer a while back when he asked the question: "Why can't variable names start with numbers?"

I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.

Was that the right answer? Are there any more reasons?

string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
+14  A: 

Well think about this:

int 2d = 42;
double a = 2d;

What is a? 2.0? or 42?

Hint, if you don't get it, d after a number means the number before it is a double literal

Pyrolistical
This is actually a [relatively] late coming notation ("d" for "double"), C89 standard IIRC. Leading numerics in identifiers aren't possible if this construct is in the language, but that is not the reason numerics can't start an identifier.
Ken Gentle
`d` isn't a valid floating literal suffix in C++. Floating literals are doubles by default, you can use `f` or `l` if you need a float or a long double literal.
Charles Bailey
It is for Java, and while the original question was for C++, it also applies to many other languages, like Java. But I agree. This isn't the original reason why identifiers can't start with numbers.
Pyrolistical
A: 

Because the language doesn't allow it.

A good work around is to put an underscore before the number, ie

int _2BeOrNot2Be = 0;
Doug T.
Many variable names with underscores are restricted. I don't think underscore-digit is a protected prefix, but I don't remember the rules all that well, and therefore never use leading underscores.
David Thornley
Underscore prefix is reserved in the global namespace.Double underscore, or underscore followed by capital letters are reserved for the implementation no matter the namespace they're declared in. So underscore digit *may* be valid, depending on where it's declared.
jalf
+1  A: 

Note also hex value 0xnnnnnnn

EDIT: I would hazard to guess that even without the ambiguities that C++ and C present, the basic convention of no leading digits and plain old readability probably are the most compelling reasons.

Tim
+18  A: 

Because then a string of digits would be a valid identifier as well as a valid number.

int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
skiphoppy
Well, what if they said variables cannot be only numbers.Then what?
Pyrolistical
not true. One could make the rule of digit and alpha.
Tim
It'd take me longer to come up with a regular expression for the lexer to pick up identifiers using that rule, if it's even possible, so I can see why no language has ever been implemented that way, in addition to the reasons given in other answers.
skiphoppy
you can make the rules as complex as you want, but you might regret it when you try to implement the compiler. ;-)
Ferruccio
I think the people writing compilers could figure it out...
Tim
note - I am not advocating it - just saying that that reason is way down on the list and most likely it is all just due to convention.
Tim
I particularly like the ability to change numbers - "int 1 = 2; int a = 1 + 1;" would set a to 4. :-)
paxdiablo
If people are going to be silly, then "L" looks like "1" - as in l234 (that's L234) - looks like a number but is legal. If you want to write obtuse code like "17 = 497" then using "L" makes it possible. But why? -R
Huntrods
This answer is actually on the right track. The real problem lies in performance. Backtracking can make well-behaved regular expressions painfully slow.
Jason Baker
If it had to be numbers+alpha, then you could still do String 0x123 = "Hello World". Unless you state that variable names are "numbers+alpha that don't parse to a valid numeric designation", and that's just silly.
eaolson
+8  A: 

Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.

Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.

This goes way back - before special notations to denote storage or numeric base.

Ken Gentle
+5  A: 

What, you're not using Common Lisp?

Will Hartung
Lisp is awesome, but must you act the http://c2.com/cgi/wiki?SmugLispWeenie part? ;-)
Jeffrey Hantin
+1 for humor, but doesn't answer the question :P
allyourcode
+5  A: 

It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.

The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.

There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".

William
A: 

I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.

The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.

Kevin
Except that C++ was invented recently by people who are still alive. We can ask them why they chose the things they did, and rejected the alternatives. Same doesn't apply to chess.
Steve Jessop
But that is not the point I'm making. It's analogy as to why there can't be numbers at the start of variable names, and the simplest answer is, because rules of the language don't allow it.
Kevin
Sure, but I don't think the questioner is an imbecile. He's probably worked out that far already by himself. The question IMO is "why don't the rules of the language allow it?". He wants to bridge the gap between knowing the rules and understanding them.
Steve Jessop
Yeah, upon reflecting on this, I realized where you were going. You have a point. I guess I was a applying Occam's razor a little to freely and assumed there is no real answer to why except that variables don't start with numbers, because there not numbers.
Kevin
I'm not saying you're wrong, mind, occasionally the decisions of the C++ standards bodies do surpass mortal understanding, and you end up with "because they had to decide something and they decided this". But there is at least a question there to be asked :-)
Steve Jessop
+3  A: 

Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.

Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.

As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.

mkClark
It isn't that difficult. It makes the lexical phase more difficult, that's all. Of course, back when I took compilers, I was told that lexical scanning could take over a quarter of the total compilation time.
David Thornley
A: 

I agree with all the answers.. philosophically speaking, each variable in a language stand for a concept(that is why its preffered to have readable names). English language doesn't have any names which begins with a number.. so that translated to programming language.

Sridhar Iyer
Well, someone may want to use 1stUser and 2ndUser instead of user1 and user2
Pavel Feldman
+1  A: 

As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.

That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.

OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.

For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.

staticsan
Actually, there are several languages that allow you to have characters marking identifiers. They're called "sigils" and you have them in Perl and PHP.
Jason Baker
Except you still aren't allowed to begin a variable name in PHP with a number - the language rules forbid it. :-) But you can in Qompose for exactly the same reason.
staticsan
A: 

C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:

0x, 2d, 5555

Kevin
This restriction holds in languages where that kind of syntax isn't allowed though.
Jason Baker
A: 

Conceptually it would not be hard. You see a string of numbers+digits+underscores, try to parse it as a number. If that fails, it's a variable. Trivial performance hit.

This would lead to nightmarish bugs though. Obfusticated C/C++ would have a whole new dimension to it. Maybe it should be allowed in Intercal...

A: 

Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.

Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)

David Thornley
The thing is that Forth doesn't really have a very sophisticated parser. Really, all it cares about is if an identifier is between two sets of whitespace.
Jason Baker
A: 

Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.

There's a much better explanation of this here.

Jason Baker
A: 

I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.

I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.

When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.

allyourcode
+1  A: 

Because if it did, it would be called COBOL.

dave
A: 

One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.

Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?

caving