Hi folks,
I have an issue with Encoding. I want to put data from a UTF-8-Encoded file into a MSSQL 2008 database. MSSQL only features UCS2 encoding, so I decided to explicitely convert the retrieved data.
// connect to page file
_fsPage = new FileStream(mySettings.filePage, FileMode.Open, FileAccess.Read);
_streamPage = new StreamReader(_fsPage, System.Text.Encoding.UTF8);
Here's the conversion routine for the data:
private string ConvertTitle(string title)
{
string utf8_String = Regex.Replace(Regex.Replace(title, @"\\.", _myEvaluator), @"(?<=[^\\])_", " ");
byte[] utf8_bytes = System.Text.Encoding.UTF8.GetBytes(utf8_String);
byte[] ucs2_bytes = System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.Unicode, utf8_bytes);
string ucs2_String = System.Text.Encoding.Unicode.GetString(ucs2_bytes);
return ucs2_String;
}
When stepping through the code for critical titles, variable watch shows the correct characters for both utf8 and ucs2 string. But in the database its - partially wrong. Some special chars are saved correctly, others not.
Wrong: ń becomes an n Right: É or é are for example inserted correctly.
Any idea where the problem might be and how to solve it?
Thans in advance, Frank