views:

2133

answers:

26

I am in a compilers class and we are tasked with creating our own language, from scratch. Currently our dilemma is whether to include a 'null' type or not. What purpose does null provide? Some of our team is arguing that it is not strictly necessary, while others are pro-null just for the extra flexibility it can provide.

Do you have any thoughts, especially for or against null? Have you ever created functionality that required null?

+32  A: 

Null: The Billion Dollar Mistake. Tony Hoare:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

moffdub
It depends on how your language will be used. I can't imagine C without it.
Chris Lutz
Would be nice if they came up with some sort of non-nullable idenfier. In C# they recently added nullable types which you can declare as MyType? ...would be nice if you could identify method parameters to never be null...ie something like MyType! or whatever...
mezoid
I've always been a fan of the Null Object Pattern.
moffdub
@moffdubI think you should copy the (Tony Hoare's) abstract to the answer as well.
Jevgeni Kabanov
Good idea Jevgeni.
moffdub
Not having null would have been a TEN billion dollar mistake. Think about it: all pointers would have to be initialized to some 'valid' value, which when encountered would cause the program to happily keep on computing *in the wrong context*, leading to extremely hard-to-find bugs = worse than crash
Steven A. Lowe
The mistake with null is that it forces you to constantly check if something is null every five lines in your code. It takes you away from the problem domain and back into tedium. And if you aren't meticulous enough... crash.
moffdub
@Steven A. Lowe: That is simply not true. Take a look at Haskell:s Maybe type. It reminds of a null but it's explicit rather than implicit whether a specific variable may or may not contain a value. This often leads to better design and should be easy for the compiler to remove when optimizing.
Daniel W
+1  A: 

If you are creating a statically typed language, I imagine that null could add a good deal of complexity to your compiler.

If you are creating a dynamically typed language, NULL can come in quite handy, as it is just another "type" without any variations.

gahooa
+5  A: 

It seems useful to have a way to indicate a reference or pointer that isn't currently pointing at anything, whether you call it null, nil, None, etc. If for no other reason to let people know when they're about to fall off the end of a linked list.

Dana
A linked list could be build like so: Node->Node->Node->EndNodeIn fact, that's how lists are build in languages that don't have `null` values.
Tom Lokhorst
+12  A: 

I usually think of 'null' in the C/C++ aspect of 'memory address 0'. It's not strictly needed, but if it didn't exist, then people would just use something else (if myNumber == -1, or if myString == "").

All I know is, I can't think of a day I've spent coding that I haven't typed the word "null", so I think that makes it pretty important.

In the .NET world, MS recently added nullable types for int, long, etc that never used to be nullable, so I guess they think its pretty important too.

If I was designing a lanaguage, I would keep it. However I wouldnt avoid using a language that didn't have null either. It would just take a little getting used too.

rally25rs
i would keep it too, indeed. i mean what else would you assign to a reference that point to nothing? (note the pun)
Johannes Schaub - litb
Well it's a question that's based on a wrong assumption. If you need to assign a reference that points at nothing, you've built your application incorrectly.
Breton
Brenton: So how would you implement a callback from a class? You have to store a pointer to a callback function in the class but what value would if have before you call setCallback()?
Timmmm
nullable types were added to .NET for because they are necessary to interact with applications or other languages that inherently support it such as databases.
Evan Plaice
+3  A: 
Charlie Martin
+3  A: 

Consider the examples of C and of Java, for example. In C, the convention is that a null pointer is the numeric value zero. Of course, that's really just a convention: nothing about the language treats that value as anything special. In Java, however, null is a distinct concept that you can detect and know that, yes, this is in fact a bad reference and I shouldn't try to open that door to see what's on the other side.

Even so, I hate nulls almost worse than anything else.

CLARIFICATION based on comments: I hate the defacto null pointer value of zero worse than I hate null.

Any time I see an assignment to null, I think, "oh good, someone has just put a landmine in the code. Someday, we're going to be walking down a related execution path and BOOM! NullPointerException!"

What I would prefer is for someone to specify a useful default or NullObject that lets me know that "this parameter has not been set to anything useful." A bald null by itself is just trouble waiting to happen.

That said, it's still better than a raw zero wandering around loose.

Bob Cross
Isn't it better to get and catch a "NullPointerException" instead of pointing to something that looks like legal values but in fact is not?
simon
As simon said, a NullPointerException is an argument *for* null, not against it.
phihag
Added the clarification based on the comments. If someone wants to indicate "this parameter hasn't been set to a useful value", I'd rather have that come through explicitly. "null" by itself doesn't tell me anything.
Bob Cross
But null literally means "this hasn't been set to a useful value".
Daniel Earwicker
No, it means "there isn't anything at the end of this reference. If you follow it, you'll die." It does not contain useful default values ala an NullEmployee named "NullName" with SSN of "000-00-0000" that is clearly unpopulated yet won't crash your application.
Bob Cross
it won't crash your application if you check for the null either. You forget to check, that's your fault for shoddy programming. If you really want default values, you check for null and then change it.
Sekhat
Repeatedly checking for null is a potentially sensible thing to do with an API that was flung at you from over the wall by a vendor. When you're working with a team of people, it doesn't make sense. The multiplication of code is on the wrong side of the equation.If my friend checks in a method returns an ArrayList of valid values, I assume that I'm going to receive an empty ArrayList if there aren't any valid results. I don't expect to get a null.
Bob Cross
+3  A: 

Null is not a mistake. Null means "I don't know yet"

For primitives you don't really need a null (I have to say that strings (in .NET) shouldn't get it IMHO)

But for composite entities it definitely serves a purpose.

Roger Willcocks
I disagree - without nullable primitives you often end up with a disconnect between systems that treat anything as nullable and your non-null primitives end up with boolean flags to indicate yes this number means something versus no, actually it doesn't. Or you end up with special sentinel values.
Robert Paulson
Yeah, but thats what System.Nullable<> is for.Which is basically automatic implementation of the boolean flag.
Roger Willcocks
+10  A: 

What's null for you ask?

Well,

Nothing.

Zoasterboy
+23  A: 

null is a sentinel value that is not an integer, not a string, not a boolean - not anything really, except something to hold and be a "not there" value. Don't treat it as or expect it to be a 0, or an empty string or an empty list. Those are all valid values and can be geniunely valid values in many circumstances - the idea of a null instead means there is no value there.

Perhaps it's a little bit like a function throwing an exception instead of returning a value. Except instead of manufacturing and returning an ordinary value with a special meaning, it returns a special value that already has a special meaning. If a language expects you to work with null, then you can't really ignore it.

staticsan
Good answer to not rely on the value being zero, because it doesn't have to be. You could add that historically, for performance, compilers will often use the value 0 as null pointer value because testing against 0 (or not 0) is typically a single machine instruction.
Robert Paulson
"Don't equate it with a 0, or an empty string or an empty list."But also do not place too much emphasis on them being different. While you might be technically right, avoiding the subtle and confusing details is way safer. I kind of like the way Oracle treats empty strings as null.
Thilo
I prefer null and "" to be distinct. If I'm checking for a null, I usually don't want "" to match. That Oracle gotcha is dangerous.
staticsan
@statiscan I'm not away of any lang that equates null to "", aka the empty string.
Neil N
MySQL will convert null to "" in some circumstances.
staticsan
+3  A: 

Null is only useful in situations where there are variables with unassigned values. If every variable has a value, then there is no need for null values.

Marshmellow1328
Your assertion is only true if and only if all variables can be assigned a value. No special values (aka sentinels) either. Mathematicians have long ago moved beyond the natural numbers. bool isCompliant (yes/no) or Nullable<bool> isCompliant (yes/no/unknown). Nulls more accurately model real life.
Robert Paulson
+6  A: 

the concept of null is not strictly necessary in exactly the same sense that the concept of zero is not strictly necessary.

Steven A. Lowe
Null is the identity of what opeartor, exactly?
Norman Ramsey
@[Norman Ramsey]: the question assumes an invalid premise. Null is not an element in a series, it is a sentinel meaning 'no value'.
Steven A. Lowe
+17  A: 

Oh no, I feel the philosophy major coming out of me....

The notion of NULL comes from the notion of the empty set in set theory. Nearly everyone agrees that the empty set is not equal to zero. Mathematicians and philosophers have been battling about the value of set theory for decades.

For computer languages, I think it is very helpful to understand object references that do not refer to anything in memory. Google about set theory and you will see similarities between the formal symbolic systems (notion) that set theorists use and symbols we use in many computer languages.

Regards, Sam

Sam
This seems a better answer to the question of whether to allow empty lists than the one about whether to have a null value.
recursive
Indeed, without a null, more boolean flags would to be used to indicate that "The value that has been set to zero actually isn't a value at all".
Arafangion
You had a philosophy major inside you? How the hell did that get there!?
Rob
Actually, zero generally *has* been defined as the empty set, since Frege, though this probably doesn't matter much to computing. (Frege defined it as the set containing the empty set. The definition of larger numbers has been less constant.)
Flash Sheridan
Interesting...I remember the prof who taught my Formal Logic class saying exactly "...zero is the set containing the empty set." That lead me to think my initial statement is correct; the empty set itself is not zero, like NULL != 0.
Sam
+3  A: 

Null is a sentinel value. It's a value that cannot possibly be real data and instead provides meta-data about the variable in use.

Null assigned to a pointer indicates that the pointer is uninitialized. This gives you the ability to detect misuse of uninitialized pointers by detecting dereferences of null valued pointers. If you instead leave the value of a pointer equal to whatever happened to be in memory then you would have crazily irregular program behavior that would be much more difficult to debug.

Also, the null character in a C-style variable length string is used to mark the end of the string.

The use of null in these ways, especially for pointer values, has become so popular that the metaphor has been imported into other systems, even when the "null" sentinel value is implemented entirely differently and has no relation to the number 0.

Wedge
+5  A: 

In C NULL was (void*(0)), so it was a type with value(?). But that didn't work with C++ templates so C++ made NULL 0, it dropped the type and became a pure value.

However it was found that having a specific NULL type would be better so they (the C++ committee) decided that NULL will once again become a type (in C++0x).

Also almost every language besides C++ has NULL as a type, or an equivalent unique value not the same as 0 (it might be equal to it or not, but its not the same value).

So now even C++ will use NULL as a type, basically closing the discussions on the matter, since now everyone (almost) will have a NULL type

Edit: Thinking about it Haskell's maybe is another solution to NULL types, but its not as easy to grasp or implement.

Robert Gould
A: 

Null provides an easy way out for programmers who haven't completely thought through the logic and domains needed by their program, or the future maintenance implications of using a value with essentially no clear and agreed upon definition.

It may seem obvious at first that it must mean "no value", but what that ACTUALLY means depends on context. If, for instance LastName === null, does that mean that person doesn't have a last name, or that we don't know what their last name is, or that it hasn't be entered into the system yet? Does null equal itself, or doesn't it? In SQL it does not. In many languages it does. But if we don't know the value of personA.lastName, or personB.lastName, how can we know that personA.lastName === personB.lastName, eh? Should the result be false, or .. . null?

It depends on what you're doing, which is why it's dangerous and silly to have some kind of system wide value that can be used for any kind of situation that kind of looks like "nothing", since how other parts of your program and external libraries or modules can't really be depended upon to correctly interpret what you meant by "null".

You're much better off clearly defining the DOMAIN of possible values of lastName, and exactly what every possible value actually means, rather than depending on some vague systemwide notion of null, which may or may not have any relevance to what you're doing, depending on which language you're using, and what you're trying to do. A value, which may in fact, behave in exactly the wrong way when you begin to operate on your data.

Breton
I think you are totally correct but totally didn't answer the question. He is asking about writing a compiler not an application.
Craig
Including a null in the language widens the domain of possible programs that can be written in a langauge to include a certain class of program that is not particularly well thought out. I think this is relevant information for someone who is designing a new language.
Breton
No certainly not. At least when you get a null pointer exception or a segfault it's a painful reminder you're doing something wrong. Doing a backflip to silence the exception doesn't mean you're not still doing something wrong.
Breton
A: 

Null is to objects what 0 is to numbers.

TJB
Thats not correct. Null represents 'Nothing', while 0 is a valid 'Not Nothing' value.
Craig
I always understood it as 'the absence of value' a la Stand and Deliver. I am in no way saying that null == 0, I'm saying that in the numbers domain, when you have no value it is represented by null. In the objects domain, the absence of value is null. I guess I was trying to be too bold ;)
TJB
@ Craig - except when you're dividing by it.
Daniel Earwicker
I wouldn't quite say null is the object reference's equivalent of 0 but I can kind of see the connection... e.g. when making measurements, 0 represents the absence of any measurable effect.
David Zaslavsky
+2  A: 

That decision depends on the objective of the programing language.

Who are you designing the programing language for? Are you designing it for people who are familiar with c-derived languages? If so, then you should probably add support for null.

In general, I would say that you should avoid violating people's expectations unless it serves a particular purpose.

Take switch-blocks in C# as an example. All case labels in C# must have an explicit control-flow expression in every branch. That is they must all end with either a "break" statement or an explicit goto. That means that while this code is legal:

switch(x)
{
    case 1:
    case 2:
        foo;
        break;
}

That this code would not be legal:

switch (x)
{
    case 1:
        foo();
    case 2:
        bar();
        break;
}

In order to create a "fall through" from case 1 to case 2, it's necessary to insert a goto, like this:

switch (x)
{
    case 1:
        foo();
        goto case 2;
    case 2:
        bar();
        break;
}

This is arguably something that would violate the expectations of C++ programmers who are leaning C#. However, adding that restriction serves a purpose. It eliminates the possibility of an entire class of common C++ bugs. It adds to the learning curve of the language slightly, but the result is a net benefit to the programmer.

If your goal is to design a language targeted at C++ programmers, then removing null would probably violate their expectations. That will cause confusion, and make your language more difficult to learn. The key question is then, "what benefit do they get"? Or, alternatively, "what detriment does this cause".

If you are simply trying to design a "super small language" that can be implemented in the course of a single semester, then the story is different. In that case your objective isn't to be build a useful language targeted at a particular segment of the population. Instead, it's just to learn how to create a compiler. In that scenario, having a smaller language is a big benefit, and so it's worth eliminating null.

So, to recap, I would say that you should:

  1. Identify your goals in creating the language. Who is the language designed for, and what are their needs.
  2. Make the decision based on what helps the target users meet their goals in the best way.

Usually this will make the desired result pretty clear.

Of course, if you don't explicitly articulate your design goals, or you can't agree on what they are, then you are still going to argue. In that case, however, you are pretty much doomed anyways.

Scott Wisniewski
+5  A: 

I don't think it's helpful to talk about null outside the context of the whole language design. First point of confusion: is the null type empty, or does it include a single, distinguished value (often called "nil")? A completely empty type is not very useful---although C uses the empty return type void to mark a procedure that is executed only for side effect, many other languages use a singleton type (usually the empty tuple) for this purpose.

I find that a nil value is used most effectively in dynamically typed languages. In Smalltalk it is the value used when you need a value but you don't have any information. In Lua it is used even more effectively: the nil value is the only value that cannot be a key or a value in a Lua table. In Lua, nil is also used as the value of missing parameters or results.

Overall I would say that a nil value can be useful in a dynamically typed setting, but in a statically typed setting, a null type is useful only for talking about functions (or procedures or methods) that are executed for side effect.

At all costs, avoid the NULL pointer used in C and Java. These are artifacts inherent in the implementations of pointers and objects, and in a well designed lanugage they should not be allowed. By all means give your users a way to extend an existing type with a null value, but make them do it explicitly, on purpose---don't force every type to have one by accident. (As an example of explicit use, I recently implemented Bentley and Sedgewick's ternary search trees in Haskell, and I needed to extend the character type with one additional value meaning 'not a character'. For this purpose Haskell provides the Maybe type.)

Finally, if you are writing a compiler, it is good to remember that the easiest parts of the language to compile, and the parts that cause the fewest bugs, are the parts that aren't there :-)

Norman Ramsey
+1  A: 

Null is a placeholder that means that no value (append "of the correct type" for a static-typed language) can be assigned to that variable.

There is cognitive dissonance here. I heard somewhere else that humans cannot comprehend negation, because they must suppose a value and then imagine its unfitness.

Overflown
+2  A: 

One other way to look at null is that it's a performance issue. If you have a complex object containing other complex objects and so on, then it is more efficient to allow for all properties to initially become null instead of creating some kind of empty objects that won't be good for nothing and soon to be replaced.

That's just one perspective that I can't see mentioned before.

danbystrom
+3  A: 

Null is not the problem - everyone treating, and interpreting null differently is the problem.

I like null. If there was no null, null would only be replaced with some other way for the code to say "I have no clue, dude!" (which some would write "I have no clue, man!", or "I have not a clue, old bean!" etc. and so, we'd have the exact same problems again).

I generalize, I know.

Marcus L
+1  A: 

What purpose does null provide?

I believe there are two concepts of null at work here.

The first (null the logical indicator) is a conventional program language mechanism that provides runtime indication of a non-initialized memory reference in program logic.

The second (null the value) is a base data value that can be used in logical expressions to detect the logical null indicator (the previous definition) and make logical decisions in program code.

Do you have any thoughts, especially for or against null?

While null has been the bane of many programmers and the source of many application faults over the years, the null concept has validity. If you and your team create a language that uses memory references that can be potentially misused because the reference was not initialized, you will likely need a mechanism to detect that eventuality. It is always an option to create an alternative, but null is a widely known alternative.

Bottom line, it all depends upon the goals of your language:

  1. target programming audience
  2. robustness
  3. performance
  4. etc...

If robustness and program correctness are high on your priority list AND you allow programmatic memory references, you will want to consider null.

BB

Bill
+2  A: 

A practical example of null is when you ask a yes/no question and don't get a response. You don't want to default to no because it might be important to know that the question wasn't answered in situations where the answer is very important.

Scott
very good way of expressing ternary logic - I like it!
Preet Sangha
The argument against this though, is you return 3 possible values, One for yes, one for no, and one for don't know. But in reality, when coding with it, if (a == dontKnow) makes no difference to if (a == null) right after obtaining the value. Except maybe a different thought process when directly reading the code.
Sekhat
+1  A: 

My suggestion to your team is: come up with some examples programs that need to be written in your language, and see how they would look if you left out null, versus if you included it.

Daniel Earwicker
+1  A: 

Use a null object pattern!

If you language is object oriented, let it have an UndefinedValue class of which only one singleton instance exists. Then use this instance wherever null is used. This has the advantage that your null will respond to messages such as #toString and #equals. You will never run into a null pointer exception as in Java. (Of course, this requires that your language is dynamically typed).

Adrian
A: 

If I get a variable from a function I want to quickly be able to see whether it is null or not.

In C/C++ you should always test pointers for null, all the time, the only exception being if you have actually read the code of all the libraries and it's libraries which it depends on and so on so forth and none of them is able to return null.

In Haskell you check whether the variable the function returns is of type Maybe, if it is you check for the type Nothing.

In C# you check for types with the questionmark sign, so called Nullable types.

I would think twice about using C:s definition of null. It does not add flexibility but rather makes it harder to create robust applications.

Daniel W