tags:

views:

442

answers:

4

I have a Excel Spreadsheet with lab data which looks like this:

µg/L (ppb)

I want to test for the presence of the Greek letter "µ" and if found I need to do something special.

Normally, I would write something like this:

if ( cell.StartsWith(matchSequence) ) { 
//.. <-- universal symbol for "magic" :)
}

I know there is an Encoding API in the Framework, but should I use it for just this one edge-case or just copy the Greek micro symbol from the character map?

How would I test for the presence of a this unicode character? The character map seems like a "cheap" fix that will bite me later (I work for a company which is multinational).

I want to do something that is maintainable and not just some crazy math-voodoo conversion that only works for this edge case.

I guess I'm asking for best practice advice here.

Thanks!

+11  A: 

You need to work out the unicode character you're interested in, then you can represent it with in code with an escape sequence.

For example, µ is U+00B5, so you just need:

if (text.Contains("\u00b5"))

You can find out the Unicode value from charmap or from the Unicode code charts.

Jon Skeet
I was going to use the tag "jon-skeet" but I thought that might be cheap :)
Chris
@jon: works great! Thanks, you're a Star!
Chris
@jon: does case matter?
Chris
Of the U? Yes - `\U` is used for Unicode characters not in the basic multilingual plane, i.e. over U+FFFF.
Jon Skeet
@Jon: sorry, I meant does the "b" in lowercase in the sequence mater? In Your example it's in uppercase, in the method call, it's in lowercase
Chris
+6  A: 

The Unicode code point for micro µ is U+00B5 and is different from the "Greek letter mu" µ, which is at U+03BC. So you can use "\u00b5" to find it, and possibly also look for "\u03bc" as well - they look the same, so whoever created the spreadsheet could have used the wrong one!

Vinay Sajip
+1 Good point taken, I'll have a look
Chris
A: 

C# code files are usually encoded in utf8, since the language is using this encoding. All strings and strign literals in c# (and other .NET languages) are encoded in utf16. So you can safely copy the micro character from the character map. You can also use its integer value as unicode literal like 0x1234.

codymanix
A: 

You can create a Char from the numeric equivelent shown to you in the Character Map (displays as U+0050 for 'P'). To do this simply check the contains:

   string value;
   if (value.Contains(Char.ConvertFromUtf32(0x0050)))
    ;
csharptest.net