views:

606

answers:

4

My company's main application is mostly written in C++ (with some Delphi code and components). We are upgrading from RAD Studio 2007 to 2010 for the next release, starting in about a week. What do I need to know to ensure this upgrade goes smoothly?

Points I have thought of so far are:

  • Unicode. This one looks really complicated. Our app contains a horrible mix of std::string-s and AnsiString-s with casts to and from them. I have lots of questions about this, such as "is wstring capable of holding everything a UnicodeString can, and should we just do a search/replace", or "should we avoid all C++ string types altogether and use UnicodeString", "can we change all event handlers to use String though the existing .HPPs event handler method prototypes were compiler-translated to AnsiString", right down to basics such as "should we prefix all strings with L, or is the compiler smart enough with Unicode enabled to use Unicode strings", etc. Any insight on this would be really appreciated.

    We also need backwards compatibility. Our app uses its own binary tuple format that currently stores strings as an array of bytes. I need to upgrade this to read old files and, presumably, write new Unicode strings as well. How do I handle Unicode strings embedded in a binary format? Is there any generic way where I can point a UnicodeString at an array of bytes, that may be originally written as either ANSI bytes or Unicode, and it will figure out what they are?

  • Third-party components. We use SpTBX mainly, and it appears to be compatible.

  • Project upgrades. The standard advice in the Codegear forums seems to be to manually recreate all project files when upgrading. This is an awful lot of work (7 projects (mostly libs) in our main app, plus half a dozen DLLs, a lot of files.) Is there any way to automate this?

  • How's the linker look? We traditionally have a lot of trouble with the linker randomly crashing or running out of resources, though it got a lot better in 2007. This is one reason our main application is split into several libs - the linker cannot (hopefully, "could not, but now can"?) handle it otherwise.

  • I know there's a new type library editor and format (it stores the IDL, ie text, and generates the TLB dynamically?) How well does this handle upgrading existing COM projects with a TLB? We have Delphi code and TLB that are built into the C++ application.

  • Is there anything else I should be considering or be aware of?

I have found:

A: 

Is the cost of upgrading in line with the benefits?

Why not start a gradual upgrade where new components would be developed on the new platform. Integrate the new components to the old version via different interop helpers.

This approach was suggested to vb6 developers who were thinking about upgrading to vb.net.

zproxy
Thanks for the comment, zproxy. The cost/benefit thing is why we skipped 2009 and are only upgrading now. We will definitely only upgrade in sections (we have a lot of secondary utilities etc we will upgrade slowly) - unfortunately the main application probably needs to be upgraded in one go.
David M
+1  A: 

You do not say what the data strings in your binary tuple format are for: is it necessary for them to store Unicode? When I transitioned from D2007 to D2009 I was able to keep some parts of the system ANSI-string only.

If storing Unicode is required, then you need to check if your existing data is compatible with a format such as UTF-8. If the range of values stored in existing data files present a problem, then I would make your next upgrade do a one-time conversion of any old data files, reading in the old AnsiString data and writing it back as UTF-8 to a different file name or extension, or by modifying appropriate file header data. I have been versioning data files for a long time, just to allow this sort of processing change.

I am only just starting a BCB2010 project, so cannot comment on your other questions, but I certainly had difficulty upgrading a Delphi project from D2007 to D2009 - though I was able to fix this by editing the project file, which is just XML.

Good luck with the conversion ;-)

IanH
It's not strictly necessary to store Unicode, but it would be useful - our app is used internationally and if we convert it fully to Unicode that will be handy in future. So I would like to convert the entire application to Unicode rather than leaving some bits ANSI. You suggestion about modifying file header data is what we'll probably do, I think. Thanks for the comments!
David M
+4  A: 

Project upgrades. The standard advice in the Codegear forums seems to be to manually recreate all project files when upgrading. This is an awful lot of work (7 projects (mostly libs) in our main app, plus half a dozen DLLs, a lot of files.) Is there any way to automate this?

There is: just use the IDE's project importer :)
Seriously, I would just try importing the projects, and then go investigate if it doesn't seem to work.

How's the linker look? We traditionally have a lot of trouble with the linker randomly crashing or running out of resources, though it got a lot better in 2007. This is one reason our main application is split into several libs - the linker cannot (hopefully, "could not, but now can"?) handle it otherwise.

I've had almost no trouble with ILINK anymore since C++Builder 2009. I've occasionally read that others experienced out-of-memory errors, but someone in the newsgroups has discovered a workaround:

https://forums.embarcadero.com/thread.jspa?messageID=140012&tstart=0#140012

Also, as you can read here, the compiler got a new option (-Cx) to control the maximal amount of memory it allocates.

I know there's a new type library editor and format (it stores the IDL, ie text, and generates the TLB dynamically?) How well does this handle upgrading existing COM projects with a TLB?

Should work without a hitch.

I have lots of questions about this, such as "is wstring capable of holding everything a UnicodeString can, and should we just do a search/replace"

Yes, on Windows platforms wchar_t usually is 16 bit large, which means it suffices for holding UTF-16 which UnicodeString is.

or "should we avoid all C++ string types altogether and use UnicodeString"

Depends on how portable your code needs to be. In any case, whenever you just need a string type, use "String", not "UnicodeString".

"can we change all event handlers to use String though the existing .HPPs were compiler-translated to AnsiString"

First, you should NEVER re-use .hpp files generated by older versions of DCC! For event handlers that use the String type in Delphi, you must use UnicodeString. As above, simply use "String", and your code will work for both the ANSI and Unicode versions of C++Builder.

right down to basics such as "should we prefix all strings with L, or is the compiler smart enough with Unicode enabled to use Unicode strings"

The compiler doesn't convert your strings (it would conflict with the language standards), but both AnsiString and UnicodeString do have copy constructor overloads for both char* and wchar_t* string literals. I.e., the following will work:

AnsiString as = L"foo";
UnicodeString us = "bar";

What will not work this way, though, is the whole bunch of printf()/scanf() functions; AnsiString::sprintf() takes const char*, UnicodeString::sprintf() takes const wchar_t*.

If you are using sprintf() a lot, you may find my CbdeFormat library useful; just read my article on the subject.

Moritz Beutel
That article looks very interesting - thankyou! Re HPP files: what I meant and didn't say clearly was the auto-generated method signatures for events use AnsiString. I remember hearing of a compiler flag for Delphi/C++ compatibility that related to this somehow, that is turned on by default. If all AnsiStrings are converted to String, can this be disabled? (The closest to this I can find is http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/devcommon/unicodeinide_xml.html, it doesn't mention flags though.)
David M
I see what you mean; that's in *.h files though :)No, there's no way to have the IDE use String instead of AnsiString/UnicodeString :( The cause is the C++ language which doesn't support strong typedefs.
Moritz Beutel
+2  A: 

Unicode. This one looks really complicated. Our app contains a horrible mix of std::string-s and AnsiString-s with casts to and from them. I have lots of questions about this, such as "is wstring capable of holding everything a UnicodeString can, and should we just do a search/replace"

std::wstring contains wchar_t* strings, just like System::UnicodeString does.

should we avoid all C++ string types altogether and use UnicodeString

That is up to you to decide. char* strings are still supported. You are not forced to migrate everything to Unicode.

can we change all event handlers to use String though the existing .HPPs were compiler-translated to AnsiString

No, you cannot change auto-managed event handlers to use the System::String alias. All IDE versions will complain about that. You will have to manually update your event handler declarations and implementations to use UnicodeString parameters instead of AnsiString parameters when appropriate. That also means you cannot share DFMs and Unit .h files across multiple IDE versions, either (which you should not be doing anyway).

should we prefix all strings with L, or is the compiler smart enough with Unicode enabled to use Unicode strings

No. If you declare a string constant or character constant without an L prefix, the data will still be interpretted as Ansi. That has not changed. You can, however, pass Ansi data to System::UnicodeString (but not to std::wstring), and it will convert to Unicode automatically. But you have to be careful because it will use the OS's default Ansi codepage to interpret the data. As long as your Ansi data is only using ASCII characters only, then you will be OK. Otherwise, if you are using non-ASCII characters, then you are better off putting the data into a System::AnsiStringT or System::RawByteString (both were introduced in CB2009) that has been assigned the correct codepage, and then assign that to your System::UnicodeString variable. The associated codepage will be used instead of the OS default codepage for the conversion.

We also need backwards compatibility. Our app uses its own binary tuple format that currently stores strings as an array of bytes. I need to upgrade this to read old files and, presumably, write new Unicode strings as well. How do I handle Unicode strings embedded in a binary format?

If your tuple is expecting 8-bit characters, then you will have to make sure that any struct declarations and such are using char and not wchar_t characters. If you need to store Unicode strings, but need to maintain the 8-bit compatibility, then you should encode your Unicode strings to UTF-8 first (you can use the System::UTF8String string type to help you - starting in CB2009, it is a true UTF-8 string now). As long as you do not use non-ASCII characters, then your old apps will not know the difference, as ASCII characters are encoded as-is in UTF-8. If you want to store raw Unicode data, however, then your tuple would need a flag somewhere (if it does not already have one) indicating whether the string data is stored as Ansi or Unicode, and your apps would have to look for that flag.

Is there any generic way where I can point a UnicodeString at an array of bytes, that may be originally written as either ANSI bytes or Unicode, and it will figure out what they are?

No. You have to know the actual encoding of the bytes beforehand. If you pass a memory address to System::AnsiString or std::string, it is going to assume Ansi characters. If you pass the same memory address to System::UnicodeString or std::wstring, it is going to assume Unicode characters instead.

Third-party components. We use SpTBX mainly, and it appears to be compatible.

Just like with all prior versions (except for the migration from 2006 to 2007), any third-party components you have will need to be re-compiled for 2010, either manually (if you have the source code for them) or by their respective vendors.

Project upgrades. The standard advice in the Codegear forums seems to be to manually recreate all project files when upgrading.

Yes. That still applies.

I know there's a new type library editor and format (it stores the IDL, ie text, and generates the TLB dynamically?)

.TLB files are not used at all anymore. The new system operates on .ridl (Reduced IDL) files now. During compiling, the .ridl produces the correct TypeLibrary information in the executable's binary resources directly. No .tlb files are generated.

How well does this handle upgrading existing COM projects with a TLB? We have Delphi code and TLB that are built into the C++ application.

I do not remember whether CB2010 (or CB2009, for that matter) can consume pre-existing .tlb files directly. I don't think they can. You can, however, run the .tlb file through tlibimp.exe and it will export a .ridl file. Or you can copy the IDL text from the TLB editor in a past version and paste it into a new .ridl file manually. Either way, you can then add that .ridl ile to your CB2010 project.

2007 and 2010 co-existing. I'm not sure I trust this answer since I have had issues with 2006 and 2007 on the same machine before.

That is why I use virtual machines when installing multiple IDE versions on the same physical machine.

Remy Lebeau - TeamB
Thanks for the very useful reply, Remy! Re wstring/String - does this mean code that currently converts string/AnsiString via c_str() on each can roughly be converted to wstring/String, again using c_str(), without data loss in the conversion? I would like to upgrade the entire application to support Unicode, because it's sold internationally and I'd like to be able to have support for non-ANSI characters throughout the app as a nice bonus for the RAD Studio upgrade.
David M
Also, re event handlers: we aren't sharing DFMs etc across versions - what I'm interested in is if the existing compiler-generated method prototypes that use AnsiString can be manually changed to String. I think your reply means yes, they can? I've also heard of, but can't find documentation for now, a compiler switch for AnsiString event handler compatibility. If this exists and my memory isn't playing tricks, do you know what it is and can this be disabled if all events and manually changed to String? I added a link to Codegear docs in the original question, but they don't mention it.
David M
Yes, you can use the c_str() methods to pass data between System::UnicodeString and std::wstring, just like you can between System::AnsiString and std::string. System::UnicodeString::c_str() and std::wstring::c_str() both return wchar_t* pointers.
Remy Lebeau - TeamB
You have to update auto-generated event handlers to use UnicodeString directly, not the String alias. There is no compiler switch for AnsiString compatibility. There is a switch (also available in the Project Options) for "String checks". If you have an AnsiString event handler that is called by Delphi code using UnicodeString instead, "String Checks" helps the Delphi code silently convert the Ansi data to Unicode data behind the scenes. This can cause compatibility and performance issues if you are not careful. Best to use UnicodeString in events, and turn off "String checks".
Remy Lebeau - TeamB
Thanks Remy! Very helpful answers.
David M