If you are someone who programs in C or C++, without the managed-language benefits of memory management, type checking or buffer overrun protection, using pointer arithmetic, how do you make sure that your programs are safe? Do you use a lot of unit tests, or are you just a cautious coder? Do you have other methods?
All of the above. I use:
- A lot of caution
- Smart Pointers as much as possible
- Data structures which have been tested, a lot of STL
- Unit tests all the time
- Memory validation tools like MemValidator and AppVerifier
- Pray every night it doesn't crash on customer site.
Actually, I am just exaggerating. Its not too bad and its actually not too hard to keep control of resources if you structure your code properly.
Interesting note. I have a large application which uses DCOM and has managed and unmanaged modules. The unmanaged modules generally are harder to debug during development, but perform very well at the customer site because of the many tests run on it. The managed modules sometimes suffer from bad code because the garbage collector is so flexible, programmers get lazy in checking resource usage.
Just as relevant - how do you ensure your files and sockets are closed, your locks released, yada yada. Memory is not the only resource, and with GC, you inherently lose reliable/timely destruction.
Neither GC nor non-GC is automatically superior. Each has benefits, each has its price, and a good programmer should be able to cope with both.
I said as much in an answer to this question.
C++ has all the features you mention.
There is memory management. You can use Smart Pointers for very precise control. Or there are a couple of Garbage collectors available though they are not part of the standard (but it most situations Smart Pointers are more than adequate).
C++ is a strongly typed language. Just like C#.
We are using buffers. You can opt to use bounds checked version of the interface. But if you know that there is not a problem then you are free to use the unchecked version of the interface.
Compare method at() (checked) to operator[] (Unchecked).
Yes we use Unit Testing. Just like you should be using in C#.
Yes we are cautious coders. Just like you should be in C#. The only difference is the pitfalls are different in the two languages.
I use lots and lots of asserts, and build both a "debug" version and a "release" version. My debug version runs much much slower than my release version, with all the checks it does.
I run frequently under Valgrind, and my code has zero memory leaks. Zero. It is a lot easier to keep a program leak-free than it is to take a buggy program and fix all the leaks.
Also, my code compiles with no warnings, despite the fact that I have the compiler set for extra warnings. Sometimes the warnings are silly, but sometimes they point right at a bug, and I fix it without any need to find it in the debugger.
I'm writing pure C (I can't use C++ on this project), but I'm doing C in a very consistent way. I have object-oriented classes, with constructors and destructors; I have to call them by hand, but the consistency helps. And if I forget to call a destructor, Valgrind hits me over the head until I fix it.
In addition to the constructor and destructor, I write a self-check function that looks over the object and decides whether it is sane or not; for example, if a file handle is null but associated file data is not zeroed out, that indicates some kind of error (either the handle got clobbered, or the file wasn't opened but those fields in the object have trash in them). Also, most of my objects have a "signature" field that must be set to a specific value (specific to each different object). Functions that use objects typically assert that the objects are sane.
Any time I malloc() some memory, my function fills the memory with 0xDC values. A structure that isn't fully initialized becomes obvious: counts are way too big, pointers are invalid (0xDCDCDCDC), and when I look at the structure in the debugger it's obvious that it's uninitialized. This is much better than zero-filling memory when calling malloc().
Any time I free memory, I erase the pointer. That way, if I have a stupid bug where the code tries to use a pointer after its memory has been freed, I instantly get a null-pointer exception, which points me right at the bug. My destructor functions don't take a pointer to an object, they take a pointer to a pointer, and clobber the pointer after destructing the object. Also, destructors wipe their objects before freeing them, so if some chunk of code has a copy of a pointer and tries to use an object, the sanity check assert fires instantly.
Valgrind will tell me if any code writes off the end of a buffer. If I didn't have that, I would have put "canary" values after the ends of the buffers, and had the sanity check test them. These canary values, like the signature values, would be debug-build-only, so the release version would not have memory bloat.
I have a collection of unit tests, and when I make any major changes to the code, it is very comforting to run the unit tests and have some confidence I didn't horribly break things. Of course I run the unit tests on the debug version as well as the release version, so all my asserts have their chance to find problems.
Putting all this structure into place was a bit of extra effort, but it pays off every day. And I feel quite happy when an assert fires and points me right at a bug, instead of having to run the bug down in the debugger. In the long run, it's just less work to keep things clean all the time.
Finally, I have to say that I actually like Hungarian notation. I worked at Microsoft a few years back, and like Joel I learned Apps Hungarian and not the broken variant. It really does make wrong code look wrong.
Andrew's answer is a good one, but I'd also add discipline to the list. I find that after enough practice with C++ that you get a pretty good feel for what's safe and what's begging for the velociraptors to come eat you. You tend to develop a coding style that feels comfortable when following the safe practices and leaves you feeling the heebie-jeebies should you try to, say, cast a smart pointer back to a raw pointer and pass it to something else.
I like to think of it like a power tool in a shop. It's safe enough once you've learned to use it correctly and as long as you make sure to always follow all the safety rules. It's when you think you can forgo the safety goggles that you get hurt.
I have done both C++ and C# and I don't see all the hype about managed code.
Oh right, there is a garbage collector for memory, that's helpful... unless you refrain from using plain old pointers in C++ of course, if you only use smart_pointers, then you don't have so much problems.
But then I would like to know... does your garbage collector protects you from:
- keeping database connections open?
- keeping locks on files?
- ...
There is much more to resources management than memory management. The good thing is C++ is that you learn rapidly what resources management and RAII means, so that it becomes a reflex:
- if I want a pointer, I want an auto_ptr, a shared_ptr or a weak_ptr
- if I want a DB connection, I want an object 'Connection'
- if I open a file, I want an object 'File'
- ...
As for buffer overruns, well, it's not like we are using char* and size_t everywhere. We do have some things call 'string', 'iostream' and of course the already mentioned vector::at method which free us from those constraints.
Tested libraries (stl, boost) are good, use them and get on to more functional problems.
Beside a lot of the good tips given here, my most important tool is DRY -- Don't Repeat Yourself. I don't spread error prone code (e.g. for handling memory allocations with malloc() and free()) all over my codebase. I have exactly one single location in my code where malloc and free are called. It is in the wrapper functions MemoryAlloc and MemoryFree.
There is all the argument checking and the initial error handling that usually is given as repeated boilerplate code around the call to malloc. Additionally, it enables anything with the need to modify only one location, beginning with simple debugging checks like counting the successful calls to malloc and free and verify at program termination that both numbers are equal, up to all kinds of extended security checkings.
Sometimes, when I read a question here like "I always have to ensure that strncpy terminates the string, is there an alternative?"
strncpy(dst, src, n);
dst[n-1] = '\0';
followed by days of discussion, I always wonder if the art of extracting repeated functionality into functions is a lost art of higher programming that is no longer taught in programming lectures.
char *my_strncpy (dst, src, n)
{
assert((dst != NULL) && (src != NULL) && (n > 0));
strncpy(dst, src, n);
dst[n-1] = '\0';
return dst;
}
Primary problem of code duplication solved -- now let's think if strncpy really is the right tool for the job. Performance? Premature optimization! And one single location to begin with it after it proves to be the bottleneck.
I have been using C++ for 10 years. I have used C, Perl, Lisp, Delphi, Visual Basic 6, C#, Java and various other languages which I can't remember off the top of my head.
The answer to your question is simple: you have to know what you're doing, more than C#/Java. The more than is what spawns such rants as Jeff Atwood's regarding "Java Schools".
Most of your questions, in a sense, are nonsensical. The 'problems' you bring up are simply facts of how hardware really works. I'd like to challenge you to write a CPU & RAM in VHDL/Verilog and see how stuff really works, even when really simplified. You'll start to appreciate that the C#/Java way is a abstraction papering over hardware.
An easier challenge would be to program an elementary operating system for an embedded system from initial power-on; it'll show you what you need to know as well.
(I've also written C# and Java)
We write in C for embedded systems. Besides using some of the techniques common to any programming language or environment, we also employ: