views:

119

answers:

2

In my current implementation of a UISearchBarController I'm using [NSString compare:] inside the filterContentForSearchText:scope: delegate method to return relevant objects based on their name property to the results UITableView as you start typing.

So far this works great in English and Korean, but what I'd like to be able to do is search within NSString's defined character clusters. This is only applicable for a handfull of languages, of which Korean is one.

In English, compare: returns new results after every letter you enter, but in Korean the results are generated once you complete a recognized grapheme cluster. I would like to be able to search through my Korean objects name property via the individual elements that make up a syllable.

Can anyone shed any light on how to approach this? I'm sure it has something to do with searching through UTF16 characters manually, or by utilising a lower level class.

Cheers!

Here is a specific example that's just not working:

`NSString *string1 = @"이"; 
`NSString *string2 = @"ㅣ";
NSRange resultRange = [[string1 decomposedStringWithCanonicalMapping] rangeOfString:    [string2 decomposedStringWithCanonicalMapping] options:(NSLiteralSearch)];

The result is always NSNotFound, with or without decomposedStringWithCanonicalMapping.

Any ideas?

+1  A: 

If you use compare:options with NSLiteralString, it should compare character by character, that is, the Unicode code points, regardless of the grapheme. The default behavior of compare: is to use no options. You could use - decomposedStringWithCanonicalMapping to get the Unicode bytes of the input string, but I'm not sure how that would interact with compare:.

Don
Thanks for taking the time to respond Don. You made me play around for a good hour today. I have added a specific example to the question of what I'm attempting to do.
Jessedc
+2  A: 

I'm no expert, but I think you're very unlikely to find a clean solution for what you want. There doesn't seem to be any relationship between a Korean character's Unicode value and the graphemes that it's made up of.

e.g. "이" is \uc774 and "ㅣ" is \u3163. From the perspective of the NSString, they're just two different characters with no specific relationship to each other.

I suspect that you will have to find or create an explicit mapping between characters and their graphemes, and then write your own search function that consults this mapping.

This very long page on Unicode Korean can help you, if it comes to that. It has a table of all the characters which suggests some structured relation between the way characters are numbered and their components.

lawrence
Lawrence, that information is fantastic. There are iPhone apps in Korean that seem to do what I have been attempting; but I guess I was under the wrong assumption as to how Korean is stored in UTF8/16.Thank you for your answer, it has definitely shed some light on my query.
Jessedc
Thanks for your help Lawrence, I've accepted your answer; it was a great step in the right direction.
Jessedc