views:

472

answers:

8

I'm making this a community wiki in order to better understand the semantic differences between these errors and their runtime or compiled consequences. Also, I've coded on Java far too long and I want to learn pointers better in C++ -- so I need other people to do it.

Edit2: I am refactoring this question. The distinction I am trying to draw is that on managed code, these errors are all uniformly dealt with via an Exception. However, C++ is not so simple -- and I want to understand whether in each case you are likely to get an error, segfault, recoverable behavior, or worse a silent error that propagates. Please see my new concrete examples (and yes, I know that the answer is always "exactly as it is coded"; I'm a programmer after all. I want to know the interesting details of what you often run in to.)

Edit3: In the following, by "class" I instead mean an instance of a class. Thanks

Error 1: Pointer value is NULL, aka pointer == 0

  • Managed code: Throws NullPointerException at runtime
  • C++: ?
  • Example: Well, duh, you have a pointer to a class but it is initialized to 0. What happens when you send it off to a function. ie. C++ does not leave any indication of class; it is just a concatenation of public "placeholders".

Error 2: Pointer points to an former class in memory whose value is NULL or == 0

  • Managed code: Not allowed through memory model. All referenced objects remain in memory. No exceptional cases?
  • C++: ?
  • Example: You had a pointer to a class and the class got deleted. You then pass the pointer as an argument to a function. Obviously, the problem that occurs depends on how the function deals with the pointed to class. My question is: Is there failsafe handling for this on STL? A good proprietary library? The average open source code?

Error 3: Pointer points to an class that is not of the correct class or subclass

  • Managed code: Throws ClassCastException.
  • C++: [correct if wrong] Compiler tries to fight this by not allowing bad casts. However, if this is to occur at runtime, I presume undefined behavior. Are there cases of similar class objects where this would not always blow up?
  • Example: Your pointer is reassigned incorrectly to have its value equal to a another class entirely. I assume that the function you pass this referenced class to will just blindly grab for the offset of any instance variables it references. Thus, it interprets the raw binary wrongly. There is no way to prevent this in C++? And/or... is there any case where this ability is exploited for good?

Error 4: Pointer points into middle of class (misaligned) or uninitialized garbage

  • Managed code: Not allowed by memory model.
  • C++: Equivalent to case 3?
  • Example: Frequently you actually do use this legally. For example you can access the array of an STL vector directly - this is pointing into the middle of a class. However, it seems just as easy to "miss"? Is there a common pitfall where you might have this happen against your will, like if a library is loaded that is different than the one you linked to (and is there a mechanism to prevent that?)

Thanks in advance to all contributors.

+1  A: 

Most of these cause unpredictable behavior. To quote Steve McConnell's Code Complete 2nd Edition "Using pointers is inherently complicated, and using them correctly requires that you have an excellent understanding of your compiler's memory-management scheme".

Lucas McCoy
That statement inaccurate. It would be better phrased as "...using them incorrectly without blowing up requires you to have an excellent understanding of your compiler's memory-management scheme". Using pointers correctly means following the standard and not guessing about compiler implementation.
Tom
A: 

#1 should throw a segfault.

#2, #3, and #4 may work, depending on what the method tries to do. Remember, in C++, class code is stored only once (and separately from instance data, which is what object pointers reference) so it can be possible to call class methods on random chunks of memory. For example, the following prints "-1" (tested with g++ 4):

#include <iostream>

class Foo
{
public:
    int x;
    void foo()
    {
        std::cout << x << std::endl;
    }
};

int main(void)
{
    void* mem = malloc(1024);
    memset(mem, 0xff, 1024);
    Foo* myFoo = (Foo*)mem;
    myFoo->foo();
    return 0;
}
Andrew Medico
This isn't valid code. #2 doesn't make sense, but #3 and #4 are undefined behavior, so may do anything, including printing -1 most of the time.
KeithB
Oops, formatting fixed.
Andrew Medico
The fixed code exhibits undefined behavior. If Foo has virtual functions, for example, this would probably crash.
KeithB
A: 

Under Windows Error 1 will cause a structured (win32) exception to be raised for an access violation as you are trying to access a virtual memory page you don't have read access to. Unix derived OS's have a similar mechanism albeit with different terminology.

This is well defined (if usually undesirable!) behaviour, and can be trapped by a structured exception handler. Typically a managed runtime will rely on the underlying OS raising this exception, and then handling it and converting it to a managed exception. This is much more efficient than checking every pointer access for null before following it.

Rob Walker
+1  A: 

Well. In C++, dereferencing the pointer in any but case 2 will yield undefined behavior, so you don't know what happens. For most operation systems, however, dereferencing the null pointer will cause a segmantation fault.

Just using the pointer in comparisons is fine for a null pointer but is not precisely defined (unspecified) for any other case than that and case 2.

Case 2 is perfectly defined. You can have your pointer point to an int that has an value of 0. I don't understand why such a thing would be illegal in C# even. Probably i have misunderstood your case 2

For case 3, you have to differentiate whether the pointer points already to that wrong object, or whether you are still trying to make it point to that. The C++ dynamic_cast will check the type of the object you point to, and if it's not derived or of the same type than your casted to type, then it will give you a null pointer. But there are other casts that do not do that check, and will leave you with an invalid pointer.

Johannes Schaub - litb
The questioner is asking about "when you dereference the pointer and call a function/method on it" - not creating the pointers.
Andrew Medico
Andrew, indeed :) but he wondered about creating the pointer to: "Compiler tries to fight this by not allowing bad casts."
Johannes Schaub - litb
+1  A: 
  1. Pointer value is NULL, aka pointer == 0 Undefined behavior. The compiler is allowed to do anything it wants, including something different each time. Under most unix based systems, this will cause a segmentation fault.
  2. Accessing a pointer that has been deleted This is undefined behavior. In some cases, depending on the exact patterns of memory allocation and use, you may be able to use the deleted pointer as if it wasn't deleted, if the memory hasn't been reused for something else. This can lead to very hard to track down bugs. If you delete the pointer a second time, you will probably corrupt the memory allocation system, causing completely unrelated news/deletes to crash
  3. Pointer points to an class that is not of the correct class or subclass C++ doesn't do runtime type checking. It will try and interpret the memory location as the type of the pointer. If an object of the correct type has not been created at that location, it is undefined behavior and any can happen (including appearing to work correctly).
  4. Pointer points into middle of class (misaligned) or uninitialized garbage Same as above, undefined behavior.

In summary, you cannot depend on any of these doing anything worthwhile. It is important to design your code so that they do not happen. The compiler helps where it can, so be very careful about trying to trick it (e.g., casts). The compiler will get its revenge, eventually.

KeithB
Point 2, undefined pointer?
WolfmanDragon
#2 makes plenty of sense. Imagine a pointer to an array of pointers, all of which are NULL.
Andrew Medico
A: 

This would be my revised version of your errors:

Error 1: null pointer/reference

  • Managed code: Throws NullReferenceException if it is a reference, or AccessViolationException if it is a pointer (yes! pointers exist in managed code!)
  • Native code: On windows, this causes an "Access Violation" (often called an AV). On Unix this would be called a "seg fault". On windows, this can in theory be caught using exception handling

Error 2: Pointer to an object that has been freed

  • Managed code: Generally undefined, but very likely an AccessViolationException. (Note that this refers to actual pointer usage, not managed references, which will always be valid)
  • Native code: Generally undefined, but likely an Access Violation.

Error 3:

  • Managed code: Throws exception
  • Native code: Depending on the type of cast, will either be a compiler error if a static cast, or an undefined result if a reinterpret cast.

Error 4:

  • Managed code: Undefined
  • Native code: Undefined
A: 

I will just add this bit tidbit of information. Pointers will do anything you tell tell them to do. Including overwriting the kernel if the program has access to said kernel.

Take point 3 for example, this is the technique used in many attacks on the kernel. Find out where the kernel resides and use pointers to change the information. By no means am I suggesting that someone try this, I do not condone the use of Rootkits or any other malware.

WolfmanDragon
A: 

If you really want to learn about pointers because you want to understand your computer better, concentrate on C or Assembly. In fact, there are some awesome tiny C compilers written in C, pull them apart and put them back together.

C++ degrades to C (I mean it can compile C files), but there is a LOT more to deal with in C++, whereas with C you can just consider the basics of pointers.

I also highly recommend you compile a C program and trace (single-step debug) through it in assembly language. If you really want to understand the underlying system, understanding stack frames and what happens during a call is pretty critical.

Other ways to learn this stuff:

  • Go audit a class in compiler construction.
  • Build something interesting with a PIC controller--a robot or a calculator.
Bill K