views:

295

answers:

1

How could i convert a Greek string, to Unicode with VB.NET, without knowing the source encoding?

A: 

Without knowing you can't do something very reliable. But if you know for sure it will be Greek, then you can try the supported Greek code pages:

  • windows-737 = OEM - Greek 437G
  • windows-869 = OEM - Modern Greek
  • windows-875 = IBM EBCDIC - Modern Greek
  • windows-1253 = Windows - Greek
  • windows-10006 = MAC - Greek I
  • windows-20423 = IBM EBCDIC - Greek
  • windows-28597 = ISO 8859-7 Greek

The most likely one is 1253 (not 1250 as above). But you can try all of them, one at the time, then check if the resulting characters are in the Greek (and maybe Latin, if you want to accept that).

For validation you can use RegExp with \p (http://msdn.microsoft.com/en-us/library/az24scfc.aspx#character_classes) and using the desired Unicode blocks (http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks).

You can try [\p{IsBasicLatin}\p{IsGreek}]* (and maybe add IsGreekExtended, although you will not get that from any of the listed code pages).

If you get something else (let's say Cyrillic) you know you got the wrong code page.

Sorry, but without knowing the code page all you do is guess. And there is only so much you can do to improve that guess.

Mihai Nita