views:

208

answers:

3

I would like to know how to reverse the process of the below DecodeBinaryBase64 so that I can have a matching Encode method. In short C# code that if given the output of this method it would return the same string that it took as input.

private static string DecodeBinaryBase64(string stringToDecode)
{
    StringBuilder builder = new StringBuilder();
    foreach (var b in Convert.FromBase64String(stringToDecode))
        builder.Append(string.Format("{0:X2}", b));
    return builder.ToString();
}

Here is an example of an encoded string and its decoded counterpart. The result is a SHA1 hash for a file. The above method is an example of understanding how the decoding works to get to the right string.

ENCODED

/KUGOuoESMWYuDb+BTMK1LaGe7k=

DECODED

FCA5063AEA0448C598B836FE05330AD4B6867BB9

or

0xFCA5063AEA0448C598B836FE05330AD4B6867BB9

Updated to reflect correct SHA1 value thanks to Porges and a fix for hex bug found by Dean 'codeka' Hardin.

Implemented Solution

Here is the the implementation I have now, it is from Porges post distilled down to two methods.

private static string EncodeFileDigestBase64(string digest)
{
    byte[] result = new byte[digest.Length / 2];

    for (int i = 0; i < digest.Length; i += 2)
        result[i / 2] = byte.Parse(digest.Substring(i, 2), System.Globalization.NumberStyles.HexNumber);

    if (result.Length != 20)
        throw new ArgumentException("Not a valid SHA1 filedigest.");

    return Convert.ToBase64String(result);
}

private static string DecodeFileDigestBase64(string encodedDigest)
{
    byte[] base64bytes = Convert.FromBase64String(encodedDigest);
    return string.Join(string.Empty, base64bytes.Select(x => x.ToString("X2")));
}  
+3  A: 

I don't believe it's physically possible. The problem is that string.Format("{0:X}", b) will return either 1 or 2 characters depending on whether the input byte is < 16 or not. And you've got no way to know once the string has been joined together.

If you can modify the DecodeBinaryBase64 method so that it always appends two character for each byte, i.e. by using string.Format("{0:X2}", b) then it will be possible by just taking the input string two characters at a time.

If you made that change to your DecodeBinaryBase64, then you can use the following to convert back again:

private static string DecodeBinaryBase64(string stringToDecode)
{
    StringBuilder builder = new StringBuilder();
    foreach (var b in Convert.FromBase64String(stringToDecode))
        builder.Append(string.Format("{0:X2}", b));
    return "0x" + builder.ToString();
}

private static string EncodeBinaryBase64(string stringToEncode)
{
    var binary = new List<byte>();
    for(int i = 2; i < stringToEncode.Length; i += 2)
    {
        string s = new string(new [] {stringToEncode[i], stringToEncode[i+1]});
        binary.Add(byte.Parse(s, NumberStyles.HexNumber));
    }
    return Convert.ToBase64String(binary.ToArray());
}

(Error checking and so on is missing, though)

Dean Harding
It is obviously possible as I am getting the encoded string from Microsoft as the method they are encoding their SHA1 hashes. While I don't need to encode, I would really like to know how to encode it. One to just know, and two it seems to be useful in that it can make a string small with base64 which usually makes them bigger.
Rodney Foley
@Creepy Gnome: it's not possible for the reason I listed: `string.Format("{0:X}", b)` will return either one or two bytes depending on whether `b` is < 16 or not. For example, take numbers: 1, 32, 16 and 4 and concatenate them together: "132164" - how can you possibly decompose them back to the original integers again?
Dean Harding
You are limiting it to the implementation of the Decode method. There are a number of ways to decode it is only shown as a working method that goes from A to B, it does appear to be more complex to go from B to A but not it is not impossible. Like I said they are getting encoding some how and that is what the question is about how to encoding something from B to A so that it can be decoded using a method similar to the one in the question. I will update the question with sample strings that work with the method.
Rodney Foley
@Creepy Gnome: as I said in my answer, if you changed that `string.Format` call to `string.Format("{0:X2}", b)` then it would be possible to convert back again. That looks to be how the example strings in your question were converted, since there's "06" and "04" in there: you would not see that if the `string.Format` call was "{0:X}". I'll update my answer with an example.
Dean Harding
The part that doesn't compile is "string s = stringToEncode[i] + stringToEncode[i+1];" To make it compile without changing the algorithm I had to do this "string s = new string(new char[] {stringToEncode[i]}) + new string(new char[]{stringToEncode[i+1]});" Since you cannot cast a char to a string and when you add char's you get an int which you also cannot cast to a string.
Rodney Foley
@Dean I believe you've made it harder for yourself by solving the wrong problem. @Creepy Gnome has asked how to encode an SHA-1 hash to a Base-64 encoded string, not how to re-encode the output of their decode method!
Porges
@Creepy Gnome: that is correct, it changes the output, but that's because - as I keep saying - the algorithm in the original question *is not reversable*. Information is lost and you cannot get it back again. Plugging "/KUGOuoESMWYuDb+BTMK1LaGe7k=" into the `DecodeBinaryBase64` method *that you have written* gives "FCA563AEA448C598B836FE533AD4B6867BB9" which is not what you're saying it gives.
Dean Harding
@Dean while you did take the long way around and your code is what shows what I need and you got to it first I am going to give the check to you. However if it wasn't for @Prorges simpling stating that my SHA1 hash was missing a byte which was my bug from a bad copy past that made me use "X" instead of "X2. I agree with Proges that you made it harder than it needed to be. I really do appreciate the solution and your efforts, and I am not upset just a little frustrated at myself and the situation.
Rodney Foley
@Porges, no, the OP clearly states that they want to reverse a specific process (labeled "decode") and then defined that process. The answer, "it's not reversable, but here's how you could make it reversable and therefore answerable" is as good as it gets.
Isaac Cambron
@Dean .. PS you may want to edit the code for others who may want to use it so that it will compile for them as they will most likely not read all the comments. :) Thanks again.
Rodney Foley
@Dean: "the algorithm in the original question is not reversable" - yes, but C.G. doesn't need to reverse it! :) They need to make something that provides *input* to their function that their function can then *decode*.
Porges
@Isaac: I mean after reading CG's comments on here. See first comment: "It is obviously possible as I am getting the encoded string from Microsoft as the method they are encoding their SHA1 hashes. While I don't need to encode, I would really like to know how to encode it".
Porges
@Porges sort of, not that the bug I didn't release is fixed with "X" vs "X2" I have a working decode method. That is what I need to get my project working with this Microsoft XML files. However out of curiosity I wanted to know how to reverse the process from any string to bytes to base64. Dean's example code will work with SHA1 only right now with a little work maybe it can work with any string.
Rodney Foley
@Creepy Gnome: No hard feelings :-) I'm glad we got there in the end! I've updated my answer so that the code compiles at least.
Dean Harding
@Dean,Isaac: See my last addendum to my answer. I assumed that what CG wanted was the right-inverse, because of the first comment "`I would really like to know how to encode it`" -- and the fact that you noted that a left-inverse wasn't workable. :)
Porges
A: 

Well, you're going from Base-64 to an ASCII/UTF-8 string - and then outputting each character as a 2-digit hex value.

I don't know of any way to automatically get that back. You may have to pull out two characters at a time, cast those as a "char", and use string.format() to turn those back into characters, maybe?

I've never seen the need to take hex output like that, and turn it back into a real string before. Hope that helps.

Robert Seder
A: 

So I expanded my answer a bit:

/** Here are the methods in question: **/
string Encode(string input)
{
    return SHA1ToBase64String(StringToBytes(input));
}

string Decode(string input)
{
    return BytesToString(Base64StringToSHA1(input));
}
/****/

string BytesToString(byte[] bytes)
{
    return string.Join("",bytes.Select(x => x.ToString("X2")));
}

byte[] StringToBytes(string input)
{
    var result = new byte[input.Length/2];

    for (var i = 0; i < input.Length; i+=2)
        result[i/2] = byte.Parse(input.Substring(i,2), System.Globalization.NumberStyles.HexNumber);

    return result;
}

string SHA1ToBase64String(byte[] hash)
{
    if (hash.Length != 20)
        throw new Exception("Not an SHA-1 hash.");

    return Convert.ToBase64String(hash);
}

byte[] Base64StringToSHA1(string input)
{
    return Convert.FromBase64String(input);
}

void Main() {

    var encoded = "/KUGOuoESMWYuDb+BTMK1LaGe7k=";

    var decoded = Decode(encoded);
    var reencoded = Encode(decoded);

    Console.WriteLine(encoded == reencoded); //True
    Console.WriteLine(decoded);
    // FCA5063AEA0448C598B836FE05330AD4B6867BB9
}

I guess the confusion in other comments was over whether you want to provide a left-inverse or a right-inverse.

That is do you want a function "f" that does:

f(Decode(x)) == x // "left inverse"

or:

Decode(f(x)) == x // "right inverse"

I assumed the latter, because you said (1st comment on other answer) that you wanted to be able to replicate Microsoft's encoding. (And what Dean noted - your function wasn't providing reversible output.) :)

Either way the above reimplements your version for correct output, so both functions are inverses of each other.

Porges
that prints a very different string...
Isaac Cambron
You are right I did short the bytes on my sample it did need to be "FCA563AEA448C598B836FE533AD4B6867BB9" which is what Dean's example shows but never explained like you. So the X2 does work correctly then, the issue was getting the string to the bytes so that they can be used with Convert.ToBase64String. Thanks for explaining my mistake to me in a way that is understandable and confrontational.
Rodney Foley
@CG: that's one character shorter than your original hash... so it's still missing some bytes :)
Porges
@Porges I noticed but I missed the window to edit the comment, I updated the original question with the right SHA1 (I hope ;) )
Rodney Foley
@Isaac: the example before I edited was from CG's original hash which was missing some digits :)
Porges
@Porges this is a nicer implementation using linq, I converted to use this one this morning.
Rodney Foley