(Warning: MASSIVE post. If you want my final answer to this question, skip to the bottom section, where I answer it. If you do, and you think I'm spouting a load of bull, please read the rest before trying to argue with my "bull.")
If I were to make a programming language, here are a few caveats:
- The type system would be more or less Perl 6 (but I totally came up with the idea first :P) - dynamically and weakly typed, with a stronger (I'm thinking Haskellian) type system that can be imposed on top of it.
- There would be a minimal number of language keywords. Everything else would be reassignable first-class objects (types, functions, so on).
- It will be a very high level language, like Perl / Python / Ruby / Haskell / Lisp / whatever is fashionable today. It will probably be interpreted, but I won't rule out compilation.
If any of those (rather important) design decisions don't apply to your ideal language (and they may very well not), then my following (apparently controversial) decision won't work for you. If you're not me, it may not work for you either. I think it fits my language, because it's my language. You should think about your language and how you want your language to be so that you, like Dennis Ritchie or Guido van Rossum or Larry Wall, can grow up to make bad design decisions and defend them in retrospect with good arguments.
Now then, I would still maintain that, in my language, identifiers would be case insensitive, and this would include variables, functions (which would be variables), types (which would also be variables, both built-in/primitive (which would be subclass-able) and user-defined), you name it.
To address issues as they come:
Naming consistency is the best argument I've seen, but I disagree. First off, allowing two different types called int
and Int
is ridiculous. The fact that Java has int
and Integer
is almost as ridiculous as the fact that neither of them allow arbitrary-precision. (Disclaimer: I've become a big fan of the word "ridiculous" lately.)
Normally I would be a fan of allowing people to shoot themselves in the foot with things like two different objects called int
and Int
if they want to, but here it's an issue of laziness, and of the old multiple-word-variable-name argument.
My personal take on the issue of underscore_case
vs. MixedCase
vs. camelCase
is that they're both ugly and less readable and if at all possible you should only use a single word. In an ideal world, all code should be stored in your source control in an agreed-upon format (the style that most of the team uses) and the team's dissenters should have hooks in their VCS to convert all checked out code from that style to their style and vice versa for checking back in, but we don't live in that world.
It bothers me for some reason when I have to continually write MixedCaseVariableOrClassNames
a lot more than it bothers me to write underscore_separated_variable_or_class_names
. Even TimeOfDay
and time_of_day
might be the same identifier because they're conceptually the same thing, but I'm a bit hesitant to make that leap, if only because it's an unusual rule (internal underscores are removed in variable names). On one hand, it could end the debate between the two styles, but on the other hand it could just annoy people.
So my final decision is based on two parts, which are both highly subjective:
- If I make a name others must use that's likely to be exported to another namespace, I'll probably name it as simply and clearly as I can. I usually won't use many words, and I'll use as much lowercase as I can get away with.
sizedint
doesn't strike me as much better or worse than sized_int
or SizedInt
(which, as far as examples of camelCase go, looks particularly bad because of the dI
IMHO), so I'd go with that. If you like camelCase (and many people do), you can use it. If you like underscores, you're out of luck, but if you really need to you can write sized_int = sizedint
and go on with life.
- If someone else wrote it, and wanted to use
sized_int
, I can live with that. If they wrote it and used SizedInt
, I don't have to stick with their annoying-to-type camelCase and, in my code, can freely write it as sizedint
.
Saying that consistency helps us remember what things mean is silly. Do you speak english or English? Both, because they're the same word, and you recognize them as the same word. I think e.e. cummings was on to something, and we probably shouldn't have different cases at all, but I can't exactly rewrite most human and computer languages out there on a whim. All I can do is say, "Why are you making such a fuss about case when it says the same thing either way?" and implement this attitude in my own language.
Throwaway variables in functions (i.e. Person person = /* something */
) is a pretty good argument, but I disagree that people would do Person thePerson
(or Person aPerson
). I personally tend to just do Person p
anyway.
I'm not much fond of capitalizing type names (or much of anything) in the first place, and if it's enough of a throwaway variable to declare it undescriptively as Person person
, then you won't lose much information with Person p
. And anyone who says "non-descriptive one-letter variable names are bad" shouldn't be using non-descriptive many-letter variable names either, like Person person
.
Variables should follow sane scoping rules (like C and Perl, unlike Python - flame war starts here guys!), so conflicts in simple names used locally (like p
) should never arise.
As to making the implementation barf if you use two variables with the same names differing only in case, that's a good idea, but no. If someone makes library X that defines the type XMLparser
and someone else makes library Y that defines the type XMLParser
, and I want to write an abstraction layer that provides the same interface for many XML parsers including the two types, I'm pretty boned. Even with namespaces, this still becomes prohibitively annoying to pull off.
Internationalization issues have been brought up. Distinguishing between capital and lowercase umlautted U's will be no easier in my interpreter/compiler (probably the former) than in my source code.
If a language has a string type (i.e. the language isn't C) and the string type supports Unicode (i.e. the language isn't Ruby - it's only a joke, don't crucify me), then the language already provides a way to convert Unicode strings to and from lowercase, like Perl's lc()
function (sometimes) and Python's unicode.lower()
method. This function must be built into the language somewhere and can handle Unicode.
Calling this function during an interpreter's compile-time rather than its runtime is simple. For a compiler it's only marginally harder, because you'll still have to implement this kind of functionality anyway, so including it in the compiler is no harder than including it in the runtime library. If you're writing the compiler in the language itself (and you should be), and the functionality is built into the language, you'll have no problems.
To answer your question, no. I don't think we should be capitalizing anything, period. It's annoying to type (to me) and allowing case differences creates (or allows) unnecessary confusion between capitalized and lowercased things, or camelCased and under_scored things, or other sets of semantically-distinct-but-conceptually-identical things. If the distinction is entirely semantic, let's not bother with it at all.