views:

62

answers:

2

Hi,

I have been looking how to validate a base64 string and came across this.

 ^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

I need a little help to make it allow "==" aswell as "=".

Thanks

+4  A: 

This should perform extremely well.

private static readonly HashSet<char> _base64Characters = new HashSet<char>() { 
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 
    'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 
    'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 
    'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/', 
    '='
};

public static bool IsBase64String(string value)
{
    if (string.IsNullOrEmpty(value))
    {
        return false;
    }
    else if (value.Any(c => !_base64Characters.Contains(c)))
    {
        return false;
    }

    try
    {
        Convert.FromBase64String(value);
        return true;
    }
    catch (FormatException)
    {
        return false;
    }
}
ChaosPandion
+1 For most simple solution.
Jesper Palm
I'd prefer the clarity of this version unless you expect a lot of invalid data.
Justin
Might as well use Base64 to check rather then RegEx...makes sense ;)
arbme
A: 

I've updated the above code a bit to meet few more requirements:

  • check for correct string size (should be multiple of 4)
  • check for pad character count (should be up to 2 character at the end of the string only)
  • make it work in .NET 2.0 (well, the HashSet<T> should be implemented or use Dictionary<T, U>)

The code is a part of my assertion library, so this is why there are two check methods and the param parameter...

    private const char Base64Padding = '=';

    private static readonly HashSet<char> Base64Characters = new HashSet<char>()
    { 
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 
        'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 
        'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 
        'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
    };

    public static void CheckBase64String(string param, string paramName)
    {
        if (CheckBase64StringSafe(param) == false)
        {
            throw (new ArgumentException(String.Format("Parameter '{0}' is not a valid Base64 string.", paramName)));
        }
    }

    public static bool CheckBase64StringSafe(string param)
    {
        if (param == null)
        {
            // null string is not Base64 something
            return false;
        }

        // replace optional CR and LF characters
        param = param.Replace("\r", String.Empty).Replace("\n", String.Empty);

        if (param.Length == 0 ||
            (param.Length % 4) != 0)
        {
            // Base64 string should not be empty
            // Base64 string length should be multiple of 4
            return false;
        }

        // replace pad chacters
        int lengthNoPadding = param.Length;
        int lengthPadding;

        param = param.TrimEnd(Base64Padding);
        lengthPadding = param.Length;

        if ((lengthNoPadding - lengthPadding) > 2)
        {
            // there should be no more than 2 pad characters
            return false;
        }

        foreach (char c in param)
        {
            if (Base64Characters.Contains(c) == false)
            {
                // string contains non-Base64 character
                return false;
            }
        }

        // nothing invalid found
        return true;
    }

I've not tested the code extensively, so there no functionality guarantees at all!

Libor