views:

421

answers:

2

We have an application that used the C++ zApp framework for UI (forms, fonts, everything). We have slowly converted it to use the .net framework and recently found that Greek characters are no longer displaying correctly.

In one version of the application I have a C# .net form and a C++ zApp form that both display the same data. The project is compiled with MS Visual Studio 2005 and uses .net 2.0. In the .net form the Greek is not displayed correctly. I can copy the text from the .net form, paste it into the zApp form and it will display correctly in the zApp form. This tells me that the data is being loaded okay and all the correct information is there in the string.

I tried making changes to the font being used in the .net code. The zApp code creates a font using a LOGFONT struct for the control displaying the Greek. I took the exact values that were being used for zApp, created a LOGFONT with those values and set the .net form's font using that structure (this.Font = Font.FromLogFont((object)lFont);). I used the same facename, charset, etc. Everything in the LOGFONT structure is getting set. The Greek was still displayed wrong. I can tell that the font I created is being used because if I set underline it will underline the text and if I look at the properties of the control's font (this.Font) after setting it with the LOGFONT, they are as I'd expect them to be. I did initially have issues with a font that wasn't a true type font, but then switched the zApp font to a true type font and it was still fine so I used that for my tests (Microsoft Sans Serif).

Also, if I type Greek characters from the keyboard they display correctly in both the .net form and the zApp form, however, the characters entered in the .net form and saved to the database will then show as garbage in the zApp forms and are different from the data saved by the zApp form. Again, if I copy the text that looks like garbage from the .net form and paste it into the zApp form then it displays just fine (no loss of data).

Does anyone have any ideas?

+1  A: 

I created a small test app in C#, and made a button with some Greek text: ελληνικά. As soon as I set the text in the button, Visual Studio asked me if I wanted to switch to Unicode, I said 'yes'. After that, the Greek text showed on my button.

I suspect that there's a setting either in Visual Studio or some property of your application configuration that needs to be set correctly.

Edit:

Your further information in your answer leads me to believe the text from the Oracle database might be UTF-8. If it is, then some of the high-order bits are used to define whether there are more bytes in the given character. Thus, not all characters are the same byte length! You solution might not work. I suggest trying to load it using

Encoding.UTF8.GetString()
Charlie Salts
I'm pretty certain my project is already using Unicode. That's why if I take the text from the .net form and paste it into the zApp form it is fine. There's no loss of data from the characters. But just to test out your suggestion I added a new form with a button to the project and did not get any prompt to switch to Unicode. It displayed the same way - if I type Greek from my keyboard it displays okay, but the Greek from the database and zApp form does not.
AAyres
The characters I'm working with in the Oracle database should always be a single byte length based on what I have read about WE8ISO8859P1.
AAyres
It looks like it needs to be converted to UTF-16 for display in .net. So our Oracale data is single byte, my solution above converts to UTF-16 to display correctly in .net, and then converts back to single byte character data for storage in Oracle and display in unmanaged C++. Sound correct to you?
AAyres
If it works, then go for it. You might consult a Unicode expert to be sure.
Charlie Salts
+1  A: 

I figured out how to get the text to display correctly in the .net form. It actually had nothing to do with the font and more to do with converting the data for .net. I have changed code that was basically like this:

string Name = reader.GetString(column);

to

string Name = System.Text.Encoding.Default.GetString(reader.GetOracleString(column).GetNonUnicodeBytes());

I will still have to verify that this does not cause problems for any of the other languages clients use that have been working fine, but so far it looks good with Greek and English.

Now I need to reverse that process when adding the OracleCommand parameter for saving. The original code went something like this:

cmd.Parameters.Add(new OracleParameter(":name", Name));

which saves garbage. The value of the string "Name" looks fine. The unmanaged C++ code that works just puts together a sql statement in a character array (the Greek text is always handled in a char array too) and executes it with a call to an OCI function (Oracle's API). The .net code is using ODAC (Oracle Data Access Client) for database access.

UPDATE:

I have solved the second part of my problem (saving) and learned more about what is happening.

The data coming in to .net from Oracle looks like this in memory when I put it into a .net string data type without doing any conversion:

00 0a 33 79 07 00 00 00 06 00 00 00 d4 00 e1 00 ec 00 e5 00 df 00 ef 00 00 00 00 00 00 00 00 00 00 00 00 ..3y........Τ.α.μ.ε.ί.ο............

This string displays incorrectly in .net as:
Ôáìåßï

The memory contents of the .net string after the conversion (conversion code shown above):
00 0a 33 79 07 00 00 00 06 00 00 00 a4 03 b1 03 bc 03 b5 03 af 03 bf 03 00 00 00 00 00 00 00 00 00 00 00 ..3y........¤.±.Ό.µ.―.Ώ............

You can see that for every character, 3 has been taken from the high nibble of the low byte and put into the high byte.
The string now displays correctly in .net as:
Ταμείο

As the information above shows, it seems that .net represents characters differently than unmanaged C++ and Oracle. I did some tests and found that the breaking point is 160 (hex value a0). So when using character values of 0 to 159 (00 to 9f), there is no difference. As soon as a value of 160 or higher is used, there will be a difference.

My solution will only work for character values between 0 and 255 because I'm dropping the high byte of the character in my conversions. This should work for our application though since we have never supported multibyte character sets anyway.

The simplified version of what I'm doing to convert the string back to a format for saving to Oracle is:

//"name" represents a .net string data type containing the data to save  

char[] textChars = new char[4000]; //4000 is the max varchar2 column size in Oracle  
byte[] textBytes;  
int index = 0;  
textBytes = (System.Text.Encoding.Default.GetBytes((name).ToCharArray()));  
foreach (byte textByte in textBytes)  
{  
    textChars[index++] = (char)textByte;  
}  
string textString = new string(textChars, 0, index);  
cmd.Parameters.Add(new OracleParameter(":name", (object)(textString)));

This whole thing is such a hack - if anyone has a better way, please share it. It seems like there ought to be some simple way of handling this entire problem.

AAyres
Are you sure the Oracle text isn't UTF-8? I know that high-order bits are used to define whether there are more bytes in the given character. So in UTF-8, not all characters are the same length - your solution would break for normal Latin characters.
Charlie Salts
The text is stored in a VARCHAR2 so it would be using the WE8ISO8859P1 character set. (AL16UTF16 is the NLS_NCHAR_CHARACTERSET setting.)
AAyres