views:

107

answers:

5

I have an array of bytes (any length), and I want to encode this array into string using my own base encoder. In .NET is standard Base64 encoder, but what if I want to encode the array in Base62, Base53 or Base13?

Is it even possible to create such universal base encoder?

I know I could do it the simple way, that is, for each byte reserve fixed number of chars (in case of Base62, that would be 5 chars), and do direct byte->chars encoding, but I would be wasting space, as 5 Base62 chars are able to contain more than 1 byte, but less than 2 bytes.

How should I write such an encoder? Or is there already some class for this?
And please note that I need universal decoder as well, otherwise this is useless to me.

Resources

As the solution is already known (use BigInteger), I would just like to put here some resources relating the BigInteger class, as it is not available in .NET 3.5:

http://stackoverflow.com/questions/176775/big-integers-in-c
http://intx.codeplex.com/
https://svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx

A: 

You can get inspiration from C# implementation of Base32 implementation by Michael Giagnocavo.

Yakeen
I've already been through that code and there is a problem: both `Base64` and `Base32` can map directly to some number of bits, 6 in case of `Base64` and 3 in case of `Base32`, but for example `Base62` doesn't map to whole number of bits. So I have no idea how to convert that `Base32` implementation into universal base encoder.
Paja
+1  A: 

If performance is not an issue, use the BigInteger class in the background. You have a constructor for BigInteger that takes byte array, and you can then manually run loops of division and modulus to get the representation in other non-standard bases.

Also take a look at this.

apoorv020
Thanks, I totally forgot about the `BigInteger` class, that could solve the problem! Performance is not an issue as long as encoding 500 bytes of data doesn't take more than 5 seconds.
Paja
Argh, `BigInteger` is in `.NET 4.0`, but I need solution for `.NET 3.5`. :-(
Paja
What about the j# library mentioned in the 2nd link?
apoorv020
I'm not actually into adding more dependencies into my project, so if I go with `BitInteger` solution, I will probably use some code that I can compile into my .exe, such as this [CodeProject implementation](http://www.codeproject.com/KB/cs/biginteger.aspx). Hovewer, +1, as `BigInteger` is actually able to solve this problem. And if nobody else suggests any other solution, I will stick to it and accept your answer. Thanks.
Paja
A: 

Another example to look at is Ascii85, used in Adobe PostScript and PDF documents. In Ascii85, 5 characters are used to encode 4 bytes. You can figure out the efficiency of this coding as (256^4)/(85^5) = 96.8%. This is the fraction of bit combinations that will actually be used.

So, for whatever new base you would want to use to encode your data, you want to look for a power that will get it just above a power of 256 if you're trying to maximize coding efficiency. This might not be easy for every base. Checking base 53 shows that the best you'll probably get is using 7 bytes to encode 5 bytes (93.6% efficiency), unless you feel like using 88 bytes to encode 63 bytes.

Justin
A: 

BASE64 works well, because 64 is a power of 2 (2^6) so each character holds 6 bits of data, and 3 bytes (3 * 8 = 24 bits) can be encoded into 4 characters (4 * 6 = 24). The encoding & decoding can be down merely bit shifting bits.

For bases which do not align with a power of 2 (like your base 62 or Base 53), Then you must treat the message you are trying to encode as one long number and perform divison and modulo operations on it. You'd probably be better off using a Base32 encoding and squandering a bit of bandwidth.

James Curran
So there isn't any other solution than using the `BigInteger` class or something similar to that class?
Paja
+2  A: 

A little late to the party, but...

Because your specification calls for an arbitrary number of bits, you must have an integer type that can work with an arbitrary number of bits. If you can't target .NET 4.0 you'll have to beg, borrow, or steal a BigInteger implementation somewhere (like .NET 4.0 perhaps).

public static class GenericBaseConverter
{
    public static string ConvertToString(byte[] valueAsArray, string digits, int pad)
    {
        if (digits == null)
            throw new ArgumentNullException("digits");
        if (digits.Length < 2)
            throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");

        BigInteger value = new BigInteger(valueAsArray);
        bool isNeg = value < 0;
        value = isNeg ? -value : value;

        StringBuilder sb = new StringBuilder(pad + (isNeg ? 1 : 0));

        do
        {
            BigInteger rem;
            value = BigInteger.DivRem(value, digits.Length, out rem);
            sb.Append(digits[(int)rem]);
        } while (value > 0);

        // pad it
        if (sb.Length < pad)
            sb.Append(digits[0], pad - sb.Length);

        // if the number is negative, add the sign.
        if (isNeg)
            sb.Append('-');

        // reverse it
        for (int i = 0, j = sb.Length - 1; i < j; i++, j--)
        {
            char t = sb[i];
            sb[i] = sb[j];
            sb[j] = t;
        }

        return sb.ToString();

    }

    public static BigInteger ConvertFromString(string s, string digits)
    {
        BigInteger result;

        switch (Parse(s, digits, out result))
        {
            case ParseCode.FormatError:
                throw new FormatException("Input string was not in the correct format.");
            case ParseCode.NullString:
                throw new ArgumentNullException("s");
            case ParseCode.NullDigits:
                throw new ArgumentNullException("digits");
            case ParseCode.InsufficientDigits:
                throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
            case ParseCode.Overflow:
                throw new OverflowException();
        }

        return result;
    }

    public static bool TryConvertFromString(string s, string digits, out BigInteger result)
    {
        return Parse(s, digits, out result) == ParseCode.Success;
    }

    private enum ParseCode
    {
        Success,
        NullString,
        NullDigits,
        InsufficientDigits,
        Overflow,
        FormatError,
    }

    private static ParseCode Parse(string s, string digits, out BigInteger result)
    {
        result = 0;

        if (s == null)
            return ParseCode.NullString;
        if (digits == null)
            return ParseCode.NullDigits;
        if (digits.Length < 2)
            return ParseCode.InsufficientDigits;

        // skip leading white space
        int i = 0;
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;
        if (i >= s.Length)
            return ParseCode.FormatError;

        // get the sign if it's there.
        BigInteger sign = 1;
        if (s[i] == '+')
            ++i;
        else if (s[i] == '-')
        {
            ++i;
            sign = -1;
        }

        // Make sure there's at least one digit
        if (i >= s.Length)
            return ParseCode.FormatError;


        // Parse the digits.
        while (i < s.Length)
        {
            int n = digits.IndexOf(s[i]);
            if (n < 0)
                return ParseCode.FormatError;
            BigInteger oldResult = result;
            result = unchecked((result * digits.Length) + n);
            if (result < oldResult)
                return ParseCode.Overflow;

            ++i;
        }

        // skip trailing white space
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;

        // and make sure there's nothing else.
        if (i < s.Length)
            return ParseCode.FormatError;

        if (sign < 0)
            result = -result;

        return ParseCode.Success;
    }
}
Tergiver
+1 for code, but **apoorv020** first came with `BigInteger` solution.
Paja
Yea, it took a few minutes to write the code.. I already had most of it as I've written several base conversions recently, I just had to alter it to work generically and with BigInteger.
Tergiver