Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

Problem is categorized in two steps:

Problem Step 1. Access 97 db containing XML strings that are encoded in UTF-8.

The problem boils down to this: the Access 97 db contains XML strings that are encoded in UTF-8. So I created a patch tool for separate conversion for the XML strings from UTF-8 to Unicode. In order to covert UTF8 string to Unicode, I have used function MultiByteToWideChar(CP_UTF8, 0, PChar(OriginalName), -1, @newName, Size);.(where newName is array as declared "newName : Array[0..2048] of WideChar;" ).

This function works good on most of the cases, I have checked it with Spainsh, Arabic, characters. but I am working on Greek and Chineese Characters it is choking.

For some greek characters like "Î•Ï…Î³. ÎšÎ±ÏÎ±Î²Î¹Î¬" (as stored in Access-97), the resultant new string contains null charaters in between, and when it is stored to wide-string the characters are getting clipped.

For some chineese characters like "?Â¢Â»?Âµ?"(as stored in Access-97), the result is totally absurd like "?¢»?µ?".

Problem Step 2. Access 97 db Text Strings, Application GUI takes unicode input and saved in Access-97

First I checked with Arabic and Spainish Characters, it seems then that no explicit characters encoding is required. But again the problem comes with greek and chineese characters.

I tried the above mentioned same function for the text conversion( Is It correct???), the result was again disspointing. The Spainsh characters which are ok with out conversion, get unicode character either lost or converted to regular Ascii Alphabets.

The Greek and Chineese characters shows similar behaviour as mentined in step 1.

Please guide me. Am I taking the right approach? Is there some other way around??? Well Right now I am confused and full of Questions :)

Actually, The application indeed uses the code Pages, i.e. as soon as the user slect specific language, the respective page code is used to encode the same. Problem is its stored in Access-97. I am not sure that while storing this encoding info is saved or lost.

Nains 2010-07-06 14:08:41

I was referring to the codepage used in the database - unless you mean that the application stores strings using different encodings in the same field. What codepage are you using for the Greek characters?

Panagiotis Kanavos 2010-07-06 14:39:11

Well, Application uses Win code page 1253 to interpret the Greek Characters from Access 97 back n forth. N u r suggesting to look for code page Database is referring. Ok I got ur point, n looking for this further.... Thanks..

Nains 2010-07-07 03:41:13

@Panagiotis Kanavos: Lets say Database uses UTF-8 Code Page, I am storing CJK( Chineese, Japaneese, Koreain) large characters strings from application. The result will be wrong encoding such as this chineese character "???Â½Â¹Â«?Â·-?Ã®Â¾Â®Â¶?". Now My Question: Is there any way to retrieve these characters sucessfully???

Nains 2010-07-07 04:40:11

Finally I got the point :)

Nains 2010-07-15 16:23:53

ansaurus

tags:

views:

answers:

Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

related questions