views:

544

answers:

9

Currently, our database uses Win1252 as the only character encoding. We will have to support Unicode in the database tables soon, which means we have to perform this migration for four databases and around 80 Delphi applications which run in-house in a 24/7 environment. Are there recommendations for database migrations to UTF-8 (or UNICODE_FSS) for Delphi applications? Some questions listed below. Many thanks in advance for your answers!

  • are there tools which help with the migration of the existing databases (sizes between 250 MB and 2 GB, no Blob fields), by dumping the data, recreating the database with UNICODE_FSS or UTF-8, and loading the data back?
  • are there known problems with Delphi 2009, dbExpress and Interbase 7.5 related to Unicode character sets?
  • would you recommend to upgrade the databases to Interbase 2009 first? (This upgrade is planned but does not have a high priority)
  • can we simply migrate the database and Delphi will handle the Unicode character sets automatically, or will we have to change all character field types in every Datamodule (dfm and source code) too?
  • which strategy would you recommend to work on the migration in parallel with the normal development and maintenance of the existing application? The application runs in-house so development and database administration is done internally.
+1  A: 

Both Database Workbench and IBExpert can do the data migration for you.

I'll get back to you on the other questions when I'm at the Entwickler Tage.

--jeroen

Jeroen Pluimers
+3  A: 

Our tool "Database Workbench" includes a DataPump tool, we have a free version that supports InterBase (does not include the DataPump) and the next major version (4.0) supports Unicode everywhere.

To convert your databases, dump all DDL -without- "character set" specifiers for Domains and other (VAR)CHAR based data, then re-create your database with UTF8 as the default character set.

In Delphi 2009, Unicode data will only be available in TWideStringFields, so yes, all your persistent fields need re-creation.

Also, if you use FieldByName(...).AsString, this will need to be .AsWideString as .AsString always returns an ANSI string.

Both dbExpress and IBX -should- support Unicode, I haven't tried either.

Martijn Tonies
next major version (4.0) supports Unicode everywhere : great :)
Hugues Van Landeghem
A: 

Problem: CHAR fields no longer work and have to be replaced with VARCHAR.

Symptom: SELECT queries on a column which now uses UTF8 and is imported from WIN1252 with ASCII values no longer returns any value. Maybe this is a bug which I should report in QC.

Solution: replace all occurences of CHAR( in the database metadata DDL script with VARCHAR(

mjustin
+1  A: 

Problem: UPDATE on a empty string field no longer finds a record. If a UTF8 character field is empty, the DataSetProvider generates a wrong SELECT for the update action.

Symptom: Message 'record not found or edited by another user'

Solution: upgrade to Delphi 2010 Update 4 or use the workaround described in QC

mjustin
A: 

Problem: persistent string fields require a Size property which is the logical size of the field multiplied by four (see also: http://stackoverflow.com/questions/2500517)

Symptom: Access violations

Solution: delete the persistent field and add it again to update the Size property. (side effect: the DisplayWidth will also increase size, leading to problems with UI)

mjustin
A: 

Problem: UDF (user defined functions) with string parameters can break because of sizes limits.

Symptom:

Dynamic SQL Error.
SQL error code = -204.
Data type unknown.
Implementation limit exceeded.
COLUMN DSQL internal.

for this UDF:

DECLARE EXTERNAL FUNCTION STRLEN
    CSTRING(32767)
    RETURNS INTEGER BY VALUE
    ENTRY_POINT 'IB_UDF_strlen' MODULE_NAME 'ib_udf';

Solution: fix UDF parameters in the declaration.

mjustin
A: 

Problem: dbExpress uses WideString as data type internally, so all existing .AsString calls for reading / setting field and parameter will no longer work

Symptom: special characters will not be stored / read correctly

Solution: replace all occurences of .AsString with .AsWideString but be careful to not change where the AsString method is not called on a field or parameter.

mjustin
A: 

Problem: dbExpress needs TStringField objects for WIN1252 fields. For UTF8 database fields, dbExpress needs TWideStringField objects.

Symptom: error message 'expected: WideString found: string'

Solution: replace all occurences of TStringField with TWideStringField. This requires that all form files (dfm) are text, not binary. The modified forms and datamodules will not be backwards compatible.

mjustin
A: 

Problem: exporting metadata and table data for a WIN1252 database will create a CP1252 encoded file, but for the import, a UTF8 file is required (tested with IBExpert)

Symptom: errors in the script import to InterBase

Solution: use iconv to convert the script file to UTF8

mjustin