tags:

views:

1218

answers:

6

How do you determine if a letter is in the range from A-Z or Digit 0-9? We are getting some corrupted data "I_999Š=ÄÖÆaðøñòòñ".

I thought I could use Char.IsLetterOrDigit("Š") to ID the corrupted data from "I_999Š", but unexpectedly this is returning true. I need to trap this, any thoughts?

+5  A: 

Well there are two quick options. The first is to use a regular expression the second is to use the Asc() function to determine if the Ascii value is in the range of those allowable characters. I would personally use Asc() for this.

EBGreen
You might want to check that. I may be thinking VB6. I haven't done VB in a while.
EBGreen
Definitely. A-Z is ASCII 65..90, 0-9 is ASCII 48..57
Tomalak
The Asc() function is available in VB.NET as well, doing the same thing.
Tomalak
Given the context (not knowing the incoming character encoding), this answer is simply wrong.
Rob Williams
A: 

A CHAR / Letter is a "small" integer, check the ascii table and you find that the value for a - z is between 65 and 122, digits ( 0 - 9 ) are between 48 and 57, check here:

http://enteos2.area.trieste.it/russo/IntroInfo2001-2002/CorsoRetiGomezel/ASCII-EBIC_files/ascii_table.jpg

If you want the digit from char to int you can always take the char and subtract 48, meaning the char "0". Hence: 1 is 49, subtracting 48 from that will give u 1.

Good Luck

Filip Ekberg
A: 
For Each m As Match In Regex.Matches("I_999Š=ÄÖÆaðøñòòñ", "[^A-Z0-9]")
    '' Found a bad character
Next

or

For Each c As Char In "I_999Š=ÄÖÆaðøñòòñ"
    If Not (c >= "A"c AndAlso c <= "Z"c OrElse c >= "0"c AndAlso c <= "9"c) Then
        '' Found a bad character
    End If
Next


EDIT:

Is there something wrong with this answer that warrants the two anonymous downvotes? Speak up, and I'll fix it. I notice that I left out a "Then" (fixed now), but I intended this as pseudocode.

P Daddy
Aw, you can't take downvotes so personally! One wasn't from me, btw.
Chris Farmer
+1  A: 

You could use a regular expression to filter out the bad characters ... (use Regex.IsMatch instead if you only need to detect it)

str = Regex.Replace(str, "[^A-Za-z0-9]","", RegexOptions.None);
Yuliy
A: 

Should just be:

if (Regex.IsMatch(input, "[A-Za-z0-9]"))
{
    // do you thang
}
weiran
+6  A: 

I can't help but notice that everyone seems to be missing the real issue: your data "corruption" appears to be an obvious character encoding problem. Therefore, no matter what you do with the data, you will be (mis)treating the symptom and ignoring the root cause.

To be specific, you appear to be attempting to interpret the received binary BYTES as ASCII text, when those BYTES were almost-certainly intended to represent text encoded as something-other-than-ASCII.

You should find out what character encoding applies to the string of text that you received. Then you should read that data while applying the appropriate character encoding transformations.

You should read Joel Spolsky's article that emphasizes that "There Ain't No Such Thing As Plain Text."

Rob Williams
This answer can't get enough up votes. Knowing the encoding of your byte stream is essential.
Chris Farmer