tags:

views:

118

answers:

3

I want to ensure certain input into my webapp contains only characters that can represented as ascii (it's for tagging and I'm trying to save space in the database column by using varchar instead of nvarchar)

Is there a framework feature to do this in .net or should I check the character codes one by one to ensure they're in the correct range?

EDIT: I just wrote this extension method but it seems too cumbersome for something so simple.

public static bool IsAscii(this string value)
{
    if (value != null)
        for (int current = 0; current < value.Length; current++)
        {
            if ((int)value[current] > 127)
                return false;
        }

    return true;
}
+1  A: 

You can run the text through a System.Text.ASCIIEncoding which will make sure everything is ASCII (any character that doesn't map to an ASCII character will be converted to a '?').

Michael Burr
Yeah I could get away with this for my purposes in a slugify method since question marks don't belong in slugs anyway. If I actually allowed question mark it would be not so favorable.
BC
I guess it depends on if you want to transform something to pure ASCII (and it's OK to lose invalid information) or if you want to validate and reject something that is not pure ASCII.
Michael Burr
@Michael validate and reject was my thinking for this purpose
BC
Then 2 comments on your IsAscii() method - it sounds like you want to validate more than just if all characters are ASCII (for example, you say that '?' is an invalid character). And do you really want it to return true for a null reference (depends on your needs, but it's something to think about)?
Michael Burr
A: 

The problem I see with your solution is your going to let in ASCII characters that you trully don't want. You are also going to exclude the extended ASCII codes which if your supporting other languages might be nice to be able to store the Accented characters.

JoshBerke
+1  A: 

Given that you're using an extension method, you're presumably using C# 3 and .NET 3.5. In that case I'd use:

using System.Linq;

...

public static bool IsPrintableAscii(this string value)
{
    return value != null && value.All(c => c >= 32 && c < 127);
}

This will check that every character is in the range [U+0020-U+007E] which only contains printable characters. Is that what you're after?

As other comments have said, that will exclude everything with accents etc - are you sure that's okay? This also rejects carriage return, linefeed and tab. Again, is that all right?

Jon Skeet