views:

229

answers:

3

Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is there a function to do this? or at least an idiom?

If you know how to get the Unicode value from the String I'd be able to work it out from there but I'm sure I'd be reinventing the wheel. (yet again)

+3  A: 

Disclaimer: I know little about Access or VBA, but what you're having is a generic Unicode problem, it's not specific to those tools. I retagged your question to add tags related to this issue.

Access is doing the right thing by returning 한, it is indeed the first character of that two-character string. What you want here is the canonical decomposition of this hangul in its constituent jamos, also known as Normalization Form D (NFD), for “decomposed”. The NFD form is ᄒ ‌ᅡ ‌ᆫ, of which the first character is what you want.

Note also that as per your example, you seem to want a function to return the equivalent hangul (ㅎ) for the jamo (ᄒ) – there really are two different code points because they represent different semantic units (a full-fledged hangul syllable, or a part of a hangul). There is no pre-defined mapping from the former to the latter, you could write a small function to that effect, as the number of jamos is limited to a few dozens (the real work is done in the first function, NFD).

Arthur Reutenauer
Hi Arthur, yes I am looking for the initial jamo (i.e. initial character) not the initial Hangul Syllable. The mapping is not difficult I just don't know how to correctly get the unicode value from the string in VBA. Initial character. Mapping can be found using (UnicodeValue - 44032) / 588. Cheers
10ToedSloth
Yes, for hanguls, you can do that algorithmically, too. In fact, that's how it's specified in the standard (http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf section 3.12, starting p. 104).
Arthur Reutenauer
+1 Thanks again, for the Unicode reference.
10ToedSloth
+1  A: 

I think what you are looking for is a Byte Array Dim aByte() as byte aByte="한글" should give you the two unicode values for each character in the string

Charles Williams
Thanks, It works, I was confused by the byte ordering at first. Doh!
10ToedSloth
A: 

I assume you got what you needed, but it seems rather convoluted. I don't know anything about this, but recently did some investigating of handling Unicode, and looked into all the string Byte functions, such as LeftB(), RightB(), InputB(), InStrB(), LenB(), AscB(), ChrB() and MidB(), and there's also StrConv(), which has a vbUnicode argument. These are all functions that I'd think would be used in any double-byte context, but then, I don't work in that environment so might be missing something very important.

David-W-Fenton
Yeah the manual syllable decomposition does seem convoluted, I really expect there is a more standardised method. Charles's suggestion of just dropping it into a byte array works fine. But as you mentioned the same could be achieved using functions like AscB(), etc .
10ToedSloth