Should primitive datatypes be capitalized?

Case insensitivity leads to some crazy internationalization stuff; think umlauts, tildes, etc. It makes the compiler harder and allows the programmer freedoms that don't result in better code. Seriously, you think there's enough arguments over where to put braces in C... just watch.

As far as primitives looking like classes... only if you can subclass primitives. Don't assume everyone capitalizes class names; the C++ standard libraries do not.

Personally, I'd like a language that has, for example, two integer types:

int: Whatever integer type is fastest on the platform, and
int(bits): An integer with the given number of bits.

You can typedef whatever you need from that. Then maybe I could get a fixed(w,f) type (number of bits to left and right of decimal, respectively) and a float(m,e). And uint and ufixed for unsigned. (Anyone who wants an unsigned float can beg.) And standardize how bit fields are packed into structures. If the compiler can't handle a particular number of bits, it should say so and abort.

Why, yes, I program embedded systems and got sick of int and long changing size every couple years, how could you tell? ^_-

(Warning: MASSIVE post. If you want my final answer to this question, skip to the bottom section, where I answer it. If you do, and you think I'm spouting a load of bull, please read the rest before trying to argue with my "bull.")

If I were to make a programming language, here are a few caveats:

The type system would be more or less Perl 6 (but I totally came up with the idea first :P) - dynamically and weakly typed, with a stronger (I'm thinking Haskellian) type system that can be imposed on top of it.
There would be a minimal number of language keywords. Everything else would be reassignable first-class objects (types, functions, so on).
It will be a very high level language, like Perl / Python / Ruby / Haskell / Lisp / whatever is fashionable today. It will probably be interpreted, but I won't rule out compilation.

If any of those (rather important) design decisions don't apply to your ideal language (and they may very well not), then my following (apparently controversial) decision won't work for you. If you're not me, it may not work for you either. I think it fits my language, because it's my language. You should think about your language and how you want your language to be so that you, like Dennis Ritchie or Guido van Rossum or Larry Wall, can grow up to make bad design decisions and defend them in retrospect with good arguments.

Now then, I would still maintain that, in my language, identifiers would be case insensitive, and this would include variables, functions (which would be variables), types (which would also be variables, both built-in/primitive (which would be subclass-able) and user-defined), you name it.

To address issues as they come:

Naming consistency is the best argument I've seen, but I disagree. First off, allowing two different types called int and Int is ridiculous. The fact that Java has int and Integer is almost as ridiculous as the fact that neither of them allow arbitrary-precision. (Disclaimer: I've become a big fan of the word "ridiculous" lately.)

Normally I would be a fan of allowing people to shoot themselves in the foot with things like two different objects called int and Int if they want to, but here it's an issue of laziness, and of the old multiple-word-variable-name argument.

My personal take on the issue of underscore_case vs. MixedCase vs. camelCase is that they're both ugly and less readable and if at all possible you should only use a single word. In an ideal world, all code should be stored in your source control in an agreed-upon format (the style that most of the team uses) and the team's dissenters should have hooks in their VCS to convert all checked out code from that style to their style and vice versa for checking back in, but we don't live in that world.

It bothers me for some reason when I have to continually write MixedCaseVariableOrClassNames a lot more than it bothers me to write underscore_separated_variable_or_class_names. Even TimeOfDay and time_of_day might be the same identifier because they're conceptually the same thing, but I'm a bit hesitant to make that leap, if only because it's an unusual rule (internal underscores are removed in variable names). On one hand, it could end the debate between the two styles, but on the other hand it could just annoy people.

So my final decision is based on two parts, which are both highly subjective:

If I make a name others must use that's likely to be exported to another namespace, I'll probably name it as simply and clearly as I can. I usually won't use many words, and I'll use as much lowercase as I can get away with. sizedint doesn't strike me as much better or worse than sized_int or SizedInt (which, as far as examples of camelCase go, looks particularly bad because of the dI IMHO), so I'd go with that. If you like camelCase (and many people do), you can use it. If you like underscores, you're out of luck, but if you really need to you can write sized_int = sizedint and go on with life.
If someone else wrote it, and wanted to use sized_int, I can live with that. If they wrote it and used SizedInt, I don't have to stick with their annoying-to-type camelCase and, in my code, can freely write it as sizedint.

Saying that consistency helps us remember what things mean is silly. Do you speak english or English? Both, because they're the same word, and you recognize them as the same word. I think e.e. cummings was on to something, and we probably shouldn't have different cases at all, but I can't exactly rewrite most human and computer languages out there on a whim. All I can do is say, "Why are you making such a fuss about case when it says the same thing either way?" and implement this attitude in my own language.

Throwaway variables in functions (i.e. Person person = /* something */) is a pretty good argument, but I disagree that people would do Person thePerson (or Person aPerson). I personally tend to just do Person p anyway.

I'm not much fond of capitalizing type names (or much of anything) in the first place, and if it's enough of a throwaway variable to declare it undescriptively as Person person, then you won't lose much information with Person p. And anyone who says "non-descriptive one-letter variable names are bad" shouldn't be using non-descriptive many-letter variable names either, like Person person.

Variables should follow sane scoping rules (like C and Perl, unlike Python - flame war starts here guys!), so conflicts in simple names used locally (like p) should never arise.

As to making the implementation barf if you use two variables with the same names differing only in case, that's a good idea, but no. If someone makes library X that defines the type XMLparser and someone else makes library Y that defines the type XMLParser, and I want to write an abstraction layer that provides the same interface for many XML parsers including the two types, I'm pretty boned. Even with namespaces, this still becomes prohibitively annoying to pull off.

Internationalization issues have been brought up. Distinguishing between capital and lowercase umlautted U's will be no easier in my interpreter/compiler (probably the former) than in my source code.

If a language has a string type (i.e. the language isn't C) and the string type supports Unicode (i.e. the language isn't Ruby - it's only a joke, don't crucify me), then the language already provides a way to convert Unicode strings to and from lowercase, like Perl's lc() function (sometimes) and Python's unicode.lower() method. This function must be built into the language somewhere and can handle Unicode.

Calling this function during an interpreter's compile-time rather than its runtime is simple. For a compiler it's only marginally harder, because you'll still have to implement this kind of functionality anyway, so including it in the compiler is no harder than including it in the runtime library. If you're writing the compiler in the language itself (and you should be), and the functionality is built into the language, you'll have no problems.

To answer your question, no. I don't think we should be capitalizing anything, period. It's annoying to type (to me) and allowing case differences creates (or allows) unnecessary confusion between capitalized and lowercased things, or camelCased and under_scored things, or other sets of semantically-distinct-but-conceptually-identical things. If the distinction is entirely semantic, let's not bother with it at all.

Oh I agree with that. I wouldn't make them "primitive" at all, but you're probably still going to have some "int" type, whether it's an object or not. You haven't commented on naming ;)

Mark 2009-12-10 03:23:24

(To the answer Kaleb edited, since I lack the rep to comment on it: Please elaborate on what you mean by "wrappers". I haven't messed with Java much, just C++.)

Mike D. 2009-12-10 04:39:19

@Mike: Java has a thing called "autoboxing", which works a lot like a pair of C++ implicit conversions, in both directions, between the primitive types (e.g. `int`), and their corresponding wrapper object types (e.g. `java.lang.Integer`). I think the main motive for this is that Java's generic containers only work with object types. So to store a bunch of ints you need a Vector of Integer, but you don't want to clutter up your code writing `vec.add(new Integer(i))`, or `i += vec.get(23).intValue();`

Steve Jessop 2009-12-10 14:16:13

Beware of using Unicode upper/lower-case functions, they might be localised. In most of the world, "int" and "Int" compare equal case-insensitive. In Turkey, they do not.

Steve Jessop 2009-12-10 13:46:30

"I'm pretty boned" - OK, "anywhere in the program" doesn't express what I mean. I meant that if Int is defined, don't allow a different thing int, but also don't allow int to refer to Int. If someone types X::XMLParser in that adaptor layer, then they've got the name wrong. I think it's better to tell them that, rather than to make everything case-insensitive and hence silently give them X::XMLparser. They just as likely meant Y::XMLParser. This is a programming language, not a search tool, and close matches don't count ;-)

Steve Jessop 2009-12-10 13:56:43

Oh, and the reason I use camelCase (or underscores) is that sizedInt isn't the same thing as sizeDint. One of them is an int that has a size, and the other one is the operation of sizing a dint. Granted, you don't especially want two names in the same scope that are so easily typoed, but I don't find them visually that similar. So actually, I personally want case-sensitive identifiers. I just propose that if there must be some case-folding going on, don't arbitrarily create 2^N-1 aliases for every identifier I define, that I don't want.

Steve Jessop 2009-12-10 14:02:58

I still agree with Steve here. I think you're restricting yourself too much by having case-insensitivity. You could always have a hybrid solution where if you have two things called `Int` and `int` then the case does matter, but if there's only one, then it doesn't. Before you say this is such a horrible idea, think about scoping in some languages. C++ lets you have constructors take arguments with the same name as member variables. It assumes you want the local one, unless you write `this->`. Just a thought ;) I'd still have my IDE correct it for me automatically if it's such an issue.

Mark 2009-12-10 22:19:26

I think a hybrid solution is even worse. In a complicated program, where you import lots of things into your namespace for convenience, you could end up invoking "case sensitive" mode on a certain variable without knowing it. C++ is very different from the language I would be making as my ideal language, and I wouldn't be taking any design decisions from Mr. Stroustrup, regardless of the fact that he's a very talented and intelligent programmer. (As for the difference, I don't use an IDE, I tend to just open vim and start coding, but I only really use Perl, Python and C, so what do I know?)

Chris Lutz 2009-12-10 22:33:22

ansaurus

tags:

views:

answers:

Should primitive datatypes be capitalized?

related questions