I have a UTF-8 encoding string I am getting from reading a PDF, and I am trying to strip out some characters that represent spaces but are not encoded as the standard 0x20 space. My problem is that the characters are represented by 3-bytes of UTF-8 and I can't figure out how to get that into a string or character so I can do a replace. The two UTF-8 characters I am trying to replace are 0xE28087 and 0xE28088.
I have tried Chr and ChrW which only take integer parameters up to 65,000 (presumably items that can be represented in a single byte in UTF-8)
I also tried using System.Text.Encoding.UTF8.GetChars() with the byte representation of my characters, but the result seems to be 4 chars instead of just one - IE it is interpreting my 3 byte character as separate one-byte characters
Dim ResultChars() As Char
Dim bytes() As Byte
Dim SpaceChar As Int32
SpaceChar = Integer.Parse("E28087", Globalization.NumberStyles.HexNumber)
bytes = BitConverter.GetBytes(SpaceChar)
ResultChars = System.Text.Encoding.UTF8.GetChars(bytes)
For Each ResultChar In ResultChars
Debug.WriteLine(ResultChar)
Next
What I am trying to do in pseudocode is simply: ConvertedText = ConvertedText.Replace(StringOrCharofThisUnicodeCharacter("0xE28087"), " ")