tags:

views:

827

answers:

4

I am stumped by the behaviour of the following in my Win32 (ANSI) function: (Multi-Byte Character Set NOT UNICODE)

void sOut( HWND hwnd, string sText ) // Add new text to EDIT Control
{ 
 int len;
 string sBuf, sDisplay;

  len = GetWindowTextLength( GetDlgItem(hwnd, IDC_EDIT_RESULTS) ); 
  if(len > 0)
  {
   // HERE:
   sBuf.resize(len+1, 0); // Create a string big enough for the data
   GetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sBuf.data(), len+1 );
  } // MessageBox(hwnd, (LPSTR)sBuf.c_str(), "Debug", MB_OK);

  sDisplay = sBuf + sText;
  sDisplay = sDisplay + "\n\0"; // terminate the string
  SetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sDisplay.c_str() );
}

This should append text to the control with each call.

Instead, all string concatenation fails after the call to GetDlgItemText(), I am assuing because of the typecast?

I have used three string variables to make it really obvious. If sBuf is affected then sDisplay should not be affected.

(also why is len 1 char less than the length in the buffer?)

GetDlgItemText() corretly returns the content of the EDIT control, and SetDlgItemText() will correctly set any text in sDisplay, but the concatenation in between is just not happening.

Is this a "hidden feature" of the string class?

Added:

Yes it looks like the problem is a terminating NUL in the middle. Now I understand why the len +1. The function ensures the last char is a NUL.

Using sBuf.resize(len); will chop it off and all is good.

Thanks for your help

Added:

Charles,

Leaving aside the quirky return length of this particular function, and talking about using a string as a buffer:

The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself.

Thats precisely whats needed isn't it?

Further, it requires that the program must not alter any of the values of that array.

As I understand it that is going to change along with the guarantee that all bytes are contiguous. I forget where I read a long article on this, but MS already implements this it asserted.

What I don't like about using a vector, is that the bytes are copied twice before I can return them. Once into the vector and again into the string. I also need to instantiate a vector object and a string object. That is a lot of overhead. If there were some string friendly of working with vectors (or CStrings) without resorting to old C functions or sopying characters one by one, I would use them. The string is very syntax friendly in that way.

+4  A: 

The data() function on a std::string returns a const char*. You are not allowed to right into the buffer returned by it, it may be a duplicated buffer.

What you could do instead is to used a std::vector<char> as a temporary buffer.

E.g. (untested)

std::vector<char> sBuf( len + 1 );
GetDlgItemText( /* ... */, &sBuf[0], len + 1 );

std::string newText( &sBuf[0] );
newText += sText;

Also, the string you pass to SetDlgItemText should be \0 terminated so you should used c_str() not data() for this.

SetDlgItemText( /* ... */, newText.c_str() );

Edit:

OK, I've just checked the contract for GetWindowTextLength and GetDlgItemText. Check my edits above. Both will include the space for a null terminator so you need to chop it off the end of your string otherwise concatenation of the two strings will include a null terminator in the middle of the string and the SetDlgItemText call will only use the first part of the string.

There is a further complication in that GetWindowTextLength isn't guaranteed to be accurate, it only guarantees to return a number big enough for a program to create a buffer for storing the result. It is extremely unlikely that this will actually affect a dialog box item owned by the calling code but in other situations the actual text may be shorter than the returned length. For this reason you should search for the first \0 in the returned text in any case.

I've opted to just use the std::string constructor that takes a const char* so that it finds the first \0 correctly.

The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself. Further, it requires that the program must not alter any of the values of that array. This means that the return value of data() may or may not be a copy of the string's internal representation and even if it isn't a copy you still aren't allowed to write to it.

Charles Bailey
tried that. no difference
Mike Trader
You are using _MBCS and not _UNICODE, aren't you?
Charles Bailey
Properties -> General -> Character Set -> 'Use Multi-Byte Character Set'
Mike Trader
Allthought the string standard does not *guarantee* contiguos memory allocation, that is about to be changed and as far as I know it is safe to do so in practice. (I do not want to #include vector for a simple win32 API function call.)
Mike Trader
But you're using standard string? If you don't want to use vector, you'll have to allocate a buffer manually. Anyway, you say you've tried `vector<char>` and it doesn't work so I'm stumped for the moment.
Charles Bailey
no I did not use vector char, I tried using c_str() instead of data(). The former makes a copy of the string (undesirable) and appends a NUL, the latter should work because it has a NUL appended anyway. Im stumped too.
Mike Trader
You said "tried that. no difference" :s
Charles Bailey
I tried your suggestion of using c_str() instead of data(). This question is about getting to the bottom of the string class behaviour. Using vector defeats the purpose.
Mike Trader
For what you are doing `string` is not suitable because no part of its interface gives you access to a writeable buffer that will set its internal representation. This is what you need for use with `GetDlgItemText`.
Charles Bailey
Passing the pointer to the first char with .data() does that doesn't it?
Mike Trader
Not necessarily, please read my edited answer.
Charles Bailey
I think thats outdate information - see above. I really appreciate your input tho :)
Mike Trader
That is not outdated information. `.data()` returns a `const char*` and that means that you cannot change the memory pointed. Then of course you can force it with `const_cast<>` or `(LPSTR)`, as C style casts do cast away constant-ness. But that does not mean it is correct, just that you can force the compiler. `data()` provides a read-only interface into either the buffer or a copy (most implementations will return the buffer itself) without null-termination guarantee (some implementations may keep the buffer null terminated at all times some may not)
David Rodríguez - dribeas
I may be wrong, but isn't this class a flavor of vector/stringbuilder in that it requests memory beyond what is needed and manages the assignment overloading etc? So even tho the framers may not explicitly want this behavior, the pointer is indeed a pointer to the underlying memory which is valid? The article I read (which I wish I could find now) implied that this interface spec was about to change and the MS had already implemented the concept so that this could be done. As for the casting, I need to read more to understand why this is giving you all the heeby jeebys
Mike Trader
C++ is a strongly typed language, and casting breaks part of that typing away. It will allow you to do things that you should not be doing in the first place.
David Rodríguez - dribeas
I have added an example of `const_cast` breaking an application in a subtle way here. Note that it is not an answer, but I cannot force it to go to the bottom of the list in any way (unless someone offers to downvote it, that is :) : http://stackoverflow.com/questions/1617464/casting-string-type-with-getdlgitemtext-disables-concatenation-in-c/1617916#1617916
David Rodríguez - dribeas
A: 

In this context if I were you I would probably use CString instead.

string sDisplay
CString sBuf;

GetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sBuf.GetBuffer(len+1), len+1 );

sBuf.Release();

sDisplay = sBuf.GetBuffer() + sText;

IMHO when you are dealing with Win32 using CString is more convenient than using std::string (wstring).

Anders K.
true, but thats another #include, and not a small one. I am allowed to pick one string class and I chose string. This is waht i have to work with. It has many benefits over char arrays esp its readable syntax for concatenation for example. Unless there is a compelling reason not to use them, I would really like to get them to work with ANSI WIN32 API function calls.
Mike Trader
I can understand your viewpoint, I thought the same myself before but in Windows you try to grab the help you can get. CString provides a number of benefits that you don't get from using string e.g. unicode/ansi transparency, conversion routines plus some others, you can also use it in ATL programs. When you are using string/wstring or some home custom tstring you are adhering to the standard STL but to what end?
Anders K.
A: 

I am far away from the win32 api and their string nightmare, but there is something in the code that you can check. Standard C++ strings do not need to be null terminated and nulls can happen anywhere within the string. I won't comment on the fact that you are casting away constantness with your C-style cast, which is a problem on its own, but rather on the strange effect you are

When you initially create the string you allocate extra space for the null (and initialize all elements to '\0') and then you copy the elements. At that point your string is len+1 in size and the last element is a null. After that you append some other string, and what you get is a string that will still have a null character at position len. When you retrieve the data with either data() (does not guarantee null termination!) or c_str() the returned buffer will still have the null character at len position. If that is passed to a function that stops on null (takes a C style string), then even if the string is complete, the function will just process the first len characters and forget about the rest.

#include <string>
#include <cstdio>
#include <iostream>
int main()
{
   const char hi[] = "Hello, ";
   const char all[] = "world!";
   std::string result;
   result.resize( sizeof(hi), 0 );
   // simulate GetDlgItemText call
   std::copy( hi, hi+sizeof(hi), const_cast<char*>(result.data()) ); // this is what your C-style cast is probably doing
   // append
   result.append( all );

   std::cout << "size: " << result.size() // 14
      << ", contents" << result // "Hello, \0world!" - dump to a file and edit with a binary editor
      << std::endl;
   std::printf( "%s\n", result.c_str() ); // "Hello, "
}

As you can see, printf expects a C-style string and will stop when the first null character is found, so that it can seem as if the append operation never took place. On the other hand, c++ streams do work properly with std::string and will dump the whole content, checking that the strings were actually appended.

A patch to your append operation disappearing would be removing the '\0' from the initial string (reserve only len space in the string). But that is not really a good solution, you should never use const_cast (there are really few places where it can be required and this is not one of them), the fact that you don't see it is even worse: using C style casts is making your code look nicer than it is.

You have commented on another answer that you do not want to add std::vector (which would provide with a correct solution as &v[0] is a proper mutable pointer into the buffer), of course, not adding the extra space for the '\0'. Consider that this is part of an implementation file, and the fact that you use or not std::vector will not extend beyond this single compilation unit. Since you are already using some STL features, you are not adding any extra requirement to your system. So to me that would be the way to go. The solution provided by Charles Bailey should work provided that you remove the extra null character.

David Rodríguez - dribeas
'reserve only len space in the string' - That didn't work in this case as the function ensuresa a terminating NUL no matter what len is specified. 'Since you are already using some STL features, you are not adding any extra' Ah yes. just verified that. THankyou. 'you should never use const_cast' I am still new to this language. Can you explain this a little more pls?
Mike Trader
You should really find a book or tutorial on C++ and read in detail about the different cast operators: `static_cast<T>(v)`, `dynamic_cast<T>(v)`, `reinterpret_cast<T>(v)`, `const_cast<T>(v)` and old-style C casts `(T)v`. Each of the former defines a particular way of performing a type conversion, while the c-style cast will perform one or more of the other casts. `const_cast<T>(v)` is a cast that is meant to remove `const` or `volatile` properties of a variable and should be avoided in most cases as it can end up in subtle errors.
David Rodríguez - dribeas
... in your particular problem, and hypothetical implementation of STL could be using ropes to hold memory, and when you call the `data()` member it could generate a buffer specifically for you that would be ignored. At that point, to avoid confusion, the interface determines that you should not modify the contents by returning a pointer to constant characters. By forbidding you to modify the buffer it is helping you avoid the error of writting into it and expecting the real string to be modified accordingly. By `const_cast`-ing you remove that limitation and you actually write...
David Rodríguez - dribeas
...into the buffer. With that hypothetical implementation your changes would be ignored in the actual string, as that buffer was just meant to be a temporary for your use. None of the implementations I know of uses ropes, and in all cases the pointer returned actually points to the raw data buffer, and the variable in which you perform the cast is in the stack... with all planets aligned I don't think your code would crash with any compiler, but the fact is that you are not coding against the standard but against your own luck.
David Rodríguez - dribeas
In other cases, the constant element could be stored by the operating system in a read-only section of memory (data segment of your program) and trying to write to it after `const_cast`-ing could kill your application right away, or the const element could be a reference into an internal member of an object and modifying it could break the class invariants an produce subtler harder to debug errors...
David Rodríguez - dribeas
A: 

This is NOT an answer. I have added it here as an answer only so that I can use formatting in a long going discussion about const_cast.

This is an example where using const_cast can break a running application:

#include <iostream>
#include <map>
typedef std::map<int,int> map_type;
void dump( map_type const & m ); // implemented somewhere else for concision
int main() {
   map_type m;
   m[1] = 10;
   m[2] = 20;
   m[3] = 30;
   map_type::iterator it = m.find(2);
   const_cast<int&>(it->first) = 10;
   // At this point the order invariant of the container is broken:
   dump(); // (1,10),(10,20),(3,30) !!! unordered by key!!!!
   // This happens with g++-4.0.1 in MacOSX 10.5
   if ( m.find(3) == m.end() ) std::cout << "key 3 not found!!!" << std::endl;
}

That is the danger of using const_cast. You can get away in some situations, but in others it will bite back, and probably hard. Try to debug in thousands of lines where the element with key 3 was removed from the container. And good luck in your search, for it was never removed.

David Rodríguez - dribeas