views:

134

answers:

3

I am getting to the last stage of my rope (a more scalable version of String) implementation. Obviously, I want all operations to give the same result as the operations on Strings whenever possible.

Doing this for ordinal operations is pretty simple, but I am worried about implementing culture-sensitive operations correctly. Especially since I know only two languages and in both of them culture-sensitive operations behave precisely the same as ordinal operations do!

So are there any specific things that I could test and get at least some confidence that I am doing things correctly? I know, for example, about ß being equal to SS when ignoring cases in German; about dotted and undotted i in Turkish.

+1  A: 

The Turkish test is the best I know :)

leppie
+2  A: 

Surrogate pairs, if you plan to support them - including invalid combinations (e.g. only one part of one).

If you're doing encoding and decoding, make sure you retain enough state to cope with being given arbitrarily blocks of binary data to decode which may end half way through a character, with the remaining half coming in the next character.

Jon Skeet
No encoding/decoding, but surrogates may end up being a real problem.
Alexey Romanov
+1  A: 

You should mimic the String methods implementations and use the core library to do this for you. It is very hard to take into account every possible aspect of every culture. Instead of re-inventing the wheel use reflector on the String methods and see the internal calls. For example String.Compare uses CultureInfo.CurrentCulture.CompareInfo.Compare for comparing 2 strings in current culture.

Diadistis
Yes, that's the plan. However, CultureInfo methods take strings. This means I need to convert a part of my rope into a string. The question is, do I have enough information to know which part?
Alexey Romanov
For example, when checking EndsWith(string suffix), is it enough to take the last suffix.Length characters of my rope? Probably not always. Is it enough to take the last suffix.Length + 5 characters? Probably yes.
Alexey Romanov
You don't need to know, just pass the rope string to the appropriate CultureInfo method :CultureInfo.CurrentCulture.CompareInfo.IsSuffix(rope.ToString(), suffix, CompareOptions.None); // Taken from String.EndsWith
Diadistis
Well, converting the entire Rope to string for each such operation would rather kill performance and most of the point of implementing Rope.
Alexey Romanov