views:

467

answers:

5

Hi all,

I have a table field in MS Access 2003 which contains HTML encoded strings like this:

Ανταγωνισμός παγκοσμίου επιπέδου στην κατάρτι&#963

How can I decode this into "normal string", using MS Access?

Thanks in advance.

A: 

Paste it into a file and save it as a HTML file, then open it in a browser.

I got some weird font like Greek or Arabic when I tried is I'm sure it'll make more sense to you than it does to me:

Ανταγωνισμός παγκοσμίου επιπέδου στην κατάρτισ

Evernoob
You are right, this is Greek language and this should be my final output. I tried the code provided by David below, and this does what I asked - it decodes HTML encoded text. However, now I realized that I need something else. I need a VB code which will decode this language characters into proper text.
Dejan
+2  A: 

There's VB code for this available on the web that runs unchanged in Access. I've been using that code in a production Access app for several years now and have never had any problems with it.

David-W-Fenton
David, thank a lot, this is what I need! Now I can write VB code to read fields from table and store back the decoded values.
Dejan
I tried this code, and it works perfectly, but as I wrote above in a comment, I just realized that I need a slightly different decoding. Not the special characters starting with "%", but with "" obviously.
Dejan
A: 

Here what I have so far. Using the Vb code provided here (BTW I could open that page only from IE7, not from FF 3.5 nor Chrome 2), I wrote following function:

Private Function UnicodeDecode(StringToDecode As String) As String
  Dim TempAns As String
  Dim CurChr As Integer
  CurChr = 1
  Do Until CurChr - 1 = Len(StringToDecode)
    Select Case Mid(StringToDecode, CurChr, 2)
    Case "&#"
      TempAns = TempAns & Chr(Mid(StringToDecode, CurChr + 2, 3))
       CurChr = CurChr + 5
    Case Else
      TempAns = TempAns & Mid(StringToDecode, CurChr, 1)
    End Select
    CurChr = CurChr + 1
  Loop
  UnicodeDecode = TempAns
End Function

Now, this works when you provide decimal value of the character up to 255. If I try to execute, for example:

Chr(338)

it fails with "Invalid procedure call or argument". I suppose MS Access supports only ISOlat1 standard by default, according to this reference. However, I need to convert unicode characters with decimal values above 913, which is ISOgrk3.

Does anybody knows how can I achieve that?

Thanks again.

Dejan
A: 

Here's an article that suggests a number of directions you might go in:

Using Unicode in Visual Basic 6 (Access's VBA is an superset of VB6)

Then you'll probably want to muck around with these Access/VBA functions:

  • StrConv()
  • AscB()
  • ChrB()

That doesn't resolve all of it, but that should give you a starting point.

Note for StrConv() the two constants for the 2nd argument, vbUnicode and vbFromUnicode, and the last, optional argument is the character set, which for Greek is given in the URL cited above as 161 (there doesn't seem to be a named constant for this -- the dbLangGreek constant returns ";LANGID=0x0408;CP=1253;COUNTRY=0").

It occurs to me that as long as you're limited to Greek for this, you might need to just set up an array that maps the characters to their corresponding numeric encoding. But I think it would be better to use a solution that handles more than one encoding.

Last of all, you might try going to this page on Michael Kaplan's old website, Trigeminal.com:

The Localized Website of Trigeminal Software, Inc.

...and scroll down to the end, "Miscellaneous I18n resources on this site." Much of that information is out of date for .NET and other programming, but it's still going to apply to VB6/Access VBA.

David-W-Fenton
Thanks again for the answer, that is a lot of resources, I'll check them out.
Dejan
A: 

Thanks so much. I have to change some code if input array have 4 digit (like persian) also the function should be public when you want to use it as macro in MS Access. apply ChrW() instead of Chr().

Public Function UnicodeDecode(StringToDecode As String) As String
  Dim TempAns As String
  Dim CurChr As Integer
  CurChr = 1
  Do Until CurChr - 1 = Len(StringToDecode)
    Select Case Mid(StringToDecode, CurChr, 2)
    Case "&#"
      TempAns = TempAns & ChrW(Mid(StringToDecode, CurChr + 2, 4))
       CurChr = CurChr + 6
    Case Else
      TempAns = TempAns & Mid(StringToDecode, CurChr, 1)
    End Select
    CurChr = CurChr + 1
  Loop
  UnicodeDecode = TempAns
End Function
Soroush
I couldn't get your code to work so I revised it to change your DO UNTIL to "Do Until CurChr > Len(StringToDecode)" and your ChrW(Mid(StringToDecode, CurChr + 2, 4)) to "ChrW(Mid(StringToDecode, CurChr + 2, 3))". The former is because the loop wasn't terminating and was ending up with an overflow (you might consider changing CurChr to Long, since you're limiting the length of the string you can decode to the maximum value for a VBA integer, which is only 32K-odd characters). The latter change is because you were passing the ";" along with the number.
David-W-Fenton
Also, it seems to me you ought not be assuming the encoded characters are going to be 3 digits, and instead pull out the value between "" and ";", since unicode characters can have numbers up to 64K-odd. Also, unicode characters can be encoded as decimal or Hex, so for full compatibility, you'd need to account for Hex values, as well. For pulling out the values, you might try use Split() with the ";" character as your delimiter and then process the resulting array.
David-W-Fenton