views:

865

answers:

2

So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions?

Clarification:

I have a string like {\b\cf12 よろてそ } and I want a string like {\b\cf12 [STUFF]}, where [STUFF] will display as よろてそ when I view the rtf text.

+2  A: 

You can simply use the AscW() function to get the correct value:-

sRTF = "\u" & CStr(AscW(char))

Note unlike other escapes for unicode, RTF uses the decimal signed short int (2 bytes) representation for a unicode character. Which makes the conversion in VB6 really quite easy.

Edit

As MarkJ points out in a comment you would only do this for characters outside of 0-127 but then you would also need to give some other characters inside the 0-127 range special handling as well.

AnthonyWJones
You could do this for all char values above 127. Chars of 127 and below 127 are the same in all code pages and can probably be left alone
MarkJ
@MarkJ: Agreed, I should probably have pointed that out, the question uses 256 which is wrong.
AnthonyWJones
Numbers below 0 also need to be converted.
Brian
@Brian: yep that too, adjust answer yet again :)
AnthonyWJones
Unicode codepoints are all > 0. If you are characters them as integers, then they will appear to be < 0 because VB6 doesn't have a 16-bit unsigned data type. Also, be sure to account for surrogate pairs
rpetrich
@rpetrich: we understand that the code points do not have sign, which is why I missed it in an edit of my answer. Do you think think that surrogate pairs really need any special handling in this case? Would they not be encoded into an RTF as surrogate pairs anyway?
AnthonyWJones
I can't be certain as I'm not an RTF guru, but it seems like the standard way is to encode surrogate pairs as two separate characters. Example: U+1D44E would become \u-10187?\u-9137? (with ? as the fallback character for both codepoints)
rpetrich
AnthonyWJones
A: 

Another more roundabout way, would be to add the MSScript.OCX to the project and interface with VBScript's Escape function. For example

Sub main()
    Dim s As String
    s = ChrW$(&H3088) & ChrW$(&H308D) & ChrW$(&H3066) & ChrW$(&H305D)
    Debug.Print MyEscape(s)
End Sub

Function MyEscape(s As String) As String
    Dim scr As Object
    Set scr = CreateObject("MSScriptControl.ScriptControl")
    scr.Language = "VBScript"
    scr.Reset
    MyEscape = scr.eval("escape(" & dq(s) & ")")
End Function

Function dq(s)
    dq = Chr$(34) & s & Chr$(34)
End Function

The Main routine passes in the original Japanese characters and the debug output says:

%u3088%u308D%u3066%u305D

HTH

boost
You should be aware that MS Script Control is not supported on Vista.
MarkJ
By "not supported" does that mean, "doesn't work" or "if it breaks, or breaks the O/S, no one's going to help me"?
boost