views:

228

answers:

2

My co-worker is filling a System::String object with double-byte characters from an unmanaged library by the following method:

RFC_PARAMETER aux;
Object* target;
RFC_UNICODE_TYPE_ELEMENT* elm;
elm = &(m_coreObject->m_pStructMeta->m_typeElements[index]);
aux.name = NULL;
aux.nlen = 0;
aux.type = elm->type;
aux.leng = elm->c2_length;
aux.addr = m_coreObject->m_rfcWa + elm->c2_offset;

GlobalFunctions::CreateObjectForRFCField(target,aux,elm->decimals);
GlobalFunctions::ReadRFCField(target,aux,elm->decimals);

Where GlobalFunctions::CreateObjectForRFCField creates a System::String object filled with spaces (for padding) to what the unmanaged library states the max length should be:

static void CreateObjectForRFCField(Object*& object, RFC_PARAMETER& par, unsigned dec)
{
    switch (par.type)
    {
        case TYPC:
            object = new String(' ',par.leng / sizeof(_TCHAR));
            break;
        // unimportant afterwards.
    }
}

And GlobalFunctions::ReadRFCField() copies the data from the library into the created String object and preserves the space padding:

static void ReadRFCField(String* target, RFC_PARAMETER& par)
{
    int lngt;
    _TCHAR* srce;
    switch (par.type)
    {
        case TYPC:
        case TYPDATE:
        case TYPTIME:
        case TYPNUM:
            lngt = par.leng / sizeof(_TCHAR);
            srce = (_TCHAR*)par.addr;
            break;

        case RFCTYPE_STRING:
            lngt = (*(_TCHAR**)par.addr != NULL) ? (int)_tcslen(*(_TCHAR**)par.addr) : 0;
            srce = *(_TCHAR**)par.addr;
            break;

        default:
            throw new DotNet_Incomp_RFCType2;
    }

    if (lngt > target->Length) lngt = target->Length;

    GCHandle gh = GCHandle::Alloc(target,GCHandleType::Pinned);
    wchar_t* buff = reinterpret_cast<wchar_t*>(gh.AddrOfPinnedObject().ToPointer());
    _wcsnset(buff,' ',target->Length);
    _snwprintf(buff,lngt,_T2WFSP,srce);
    gh.Free();
}

Now, on occasion, we see access violations getting thrown in the _snwprintf call. My question really is: Is it appropriate to create a string padded to a length (ideally to pre-allocate the internal buffer), and then to modify the String using GCHandle::Alloc and the mess above.

And yes, I know that System::String objects are supposed to be immutable - I'm looking for a definitive "This is WRONG and here is why".

Thanks, Eli.

+2  A: 

I'm amazed this ever appears to work. If I understand it, you pin a String object, get the address of it, and then cast it to a buffer of characters. It's not a buffer of characters. CLR objects start with an 8-byte header (in 32-bit, anyway). You're probably trashing the internal data used by the CLR in garbage collection.

Why not allocate a native buffer (std::vector<wchar_t> would be great) to pass to the native API, and then safely construct a CLR String from that buffer?

Update:

Okay, here's a reference: http://www.drdobbs.com/cpp/184403869

Turns out that the pinning API being used has special knowledge of the layout of String, and knows how to find and return the raw internal character buffer. Yeesh!

But to quote that article:

One last important point: in some of these examples, I’m showing how pin_cast can be used to access the private data buffer of managed Strings and Arrays, possibly in a non-const manner. Given these are sealed types, and that the entire implementation is unknown, it would be bad to assume you could safely modify the contents of these buffers, even if the memory is pinned.

Interestingly, the API documentation doesn't mention the special behaviour for strings.

Daniel Earwicker
@Earwicker: Yeesh indeed. Good to know this, though.
Moron
@Earwicker - don't let my co-worker know that. Personally, I wish he had initialized the String using the constructor (he could have easily arranged things that way).
Eli
A: 

Actually, the issue was not with the .NET string as an output buffer, but with the input buffer instead.

The sprintf("%s") class functions (including wsprintf and so on) will perform a strlen-type operation on any parameters on the string - EVEN if it is snwprintf - the "n" part only limits the amount WRITTEN to the string, NOT the characters read from the input buffer.

Turns out the input buffer was never guaranteed to be null terminated. Often times we get lucky because if the data returned was small, it would hit SOMETHING null before reaching bad memory.

However, if the data in there is big enough, it would go to the end of the memory page. When the strlen keeps going, it walks off the page, and Access Violation city!

Luckily I found this when testing something else with a native mode debugger attached and ready with all the debug symbols from MS for the C runtime!

We switched the snwprintf to a wcsncpy() instead (the snwprintf was a legacy operation when we had to do ANSI->Unicode conversion - why he didn't do MultiByteToWideChar() instead, I'll never know.

thanks for the advice at any rate.

Eli