views:

1454

answers:

2

I have an app written in C# that does a lot of string comparrison. The strings are pulled in from a variety of sources (including user input) and are then compared. However i'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the compare, there is no match.

What is the best way to go about this? Am I going to have to go to all parts of the code that do a string compare and manually normalize non-breaking spaces to spaces? Does .NET offer anything to help with that (I've tried all the compare options but non seem to help.)

It has been suggested that I normalize the strings upon receipt and then let the string compare method simply compare the normalized strings. I'm not sure it would be straight-forward to do that because what is a normalized string in the first place. What do I normalize it too? Sure, for now I can convert non-breaking spaces to breaking spaces. But what else can show up? Can there potentially be very many of these rules? Might they even be conflicting. (in one case i want to use a rule and in another i don't)

Thanks

+6  A: 

If it were me, I would 'normalize' the strings as I 'pulled them in'; probably with a string.Replace(). Then you won't need to change your comparisons anywhere else.

Edit: Mark, that's a tough one. Its really up to you, or you clients, as to what is a 'normalized' string. I've been in a similar situation where the customer demanded that strings like:

I have 4 apples.
I have four apples.

were actually equal. You may need separate normalizers for different situations. Either way, I would still do the normalization upon retrieval of the original strings.

John Kraft
I would do the smame.
Koistya Navin
Yep, I would normalize the strings to what you care about with your own function that calls string.Replace and then does the compare.
NoahD
guys how do i post a follow up question or a clarification question to this proposed answer? Do i do it here? This only allows 255 characters
Mark
@Mark: Edit your question to include follow-ups or clarifications.
Banang
+1  A: 

I went through lots of pain to find this simple answer. The code below uses a regular expression to replace non breaking spaces with normal spaces.

string cellText = "String with non breaking spaces.";
cellText = Regex.Replace(cellText, @"\u00A0", " ");

Hope this helps, Dan