tags:

views:

126

answers:

4

I am a big fan of GCC, but recently I noticed a vague anomaly. Using __gnu_cxx::__normal_iterator (ie, the most common iterator type used in libstdc++, the C++ STL) it is possible to refer to an arbitrary memory location and even change its value without causing an exception! Is this expected behavior? If so, isn't a security loophole?

Here's an example:

#include <iostream>
using namespace std;

int main() {
        basic_string<char> str("Hello world!");
        basic_string<char>::iterator iter = str.end();



        iter += str.capacity() + 99999;
        *iter = 'x';

        cout << "Value: " << *iter << endl;
}
+1  A: 

C++ generally has a philosophy of not making you pay for what you don't use. It is up to you to validate that you're using iterators properly. For a random-access iterator, you can always test it:

if (iter < str.begin() || iter >= str.end())
    throw something;
Mark Ransom
+5  A: 

Dereferencing an iterator beyond the end of the container from which it was obtained is undefined behavior, and doing nothing is just a possibility there.

Note that this is a question of compromise, it is nice having iterators check for validity for development, but that adds extra operations to the code. In MSVS iterators are by default checked (they will verify that they are valid and fail hard when they are used in a wrong way=. But that also has an impact in runtime performance.

The solution that Dinkumware (STL inside VS) provides (checked by default, can be unchecked through compiler options) is in fact a good choice, the user selects whether he wants slow safe iterators or fast unsafe versions of it. But from the point of view of the language, both are valid.

David Rodríguez - dribeas
@David: Thanks for the explanation; it was precise and exactly what I was looking for. I'm currently in the process of improving my codebase. I know for sure that any string allocated by `g++` is always ended with `EOF`, `WEOF` or at the very least `CharType()` (ie, the last unit of allocated memory is assigned to that). So do you think it is good idea to use debug assertions on that condition? Do you have any other suggestions?
themoondothshine
Are you referring to string literals or `std::string` instances? String literals `"asdf"` are always ended in `(char)0`. With `std::string` there is no guarantee whatsoever of what the contents of the memory inside it can be at any given point. Note, as an example, that in g++ 4.4 and empty string has no allocated memory, so it is surely is not EOF/WEOF/0 terminated.
David Rodríguez - dribeas
I was referring to `std::string`. Yes, you're right about g++44 empty strings: It holds a pointer to a (static) shared empty rep. But, suppose if I use an iterator on a non-empty std::string, I can assume that it would always terminated with EOF/WEOF/0? g++44 follows up stirng allocations with `traits_type::assign(this->_M_refdata()[__n], _S_terminal);` doesn't it?
themoondothshine
@themoondothshine, you may be guaranteed that condition for today's version of gcc, but what about tomorrow's? It's *never* a good idea to rely on implementation details like that.
Mark Ransom
A: 

You got lucky. Or unlucky. Using your exact example, I segfaulted.

$ ./a.exe
  11754 [main] a 4992 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack)
Segmentation fault (core dumped)

Undefined behavior can mean different things on different compiles, platforms, days. Perhaps when you ran it, the address created by all that adding ended up in some other valid memory space, just by chance. Maybe you incremented from the stack to the heap for example.

Dan
Hmmm... Interesting. I was all most convinced that it wouldn't throw a segfault: Each time I ran it, it worked!
themoondothshine
"throw a segfault" -- just want to clarify something. A segfault isn't an exception that is "thrown" because some software logic detected a bounds overrun. Might work that way in Java/C# but not in C++. What's causing the segfault is that the memory manager detects an access violation (reading/writing memory that is not allowed). So when you ran the program, either it happened to create a pointer to a valid memory location for your program that wasn't protected by the MMU, or the MMU isn't doing any checking at all (common in embedded systems that run everything in kernel space).
Dan
+1  A: 

No, this is not a problem. Keep in mind that typical iterator usage is:

for ( type::const_iterator it = obj.begin(); it != obj.end(); ++it ){
    // Refer to element using (*it)
}

Proper iterator usage requires one to check against the end() iterator. With random access iterators such as the one you are using, you can also use < and > with the iterators against end(). C and C++ don't typically do bounds checking as in Java, and it is your place to ensure that you do so.

Michael Aaron Safyan
+1. This is as much of a problem as the fact that C++ doesn't bounds-check pointer access. That is, it's a big problem, but it's *our* problem as programmers, not a problem with the implementation. The implementation is allowed to point and laugh when we get it wrong.
Steve Jessop