tags:

views:

153

answers:

3

I'm using python's base64 module and I get a string that can be encoded or not encoded. I would like to do something like:

if isEncoded(s):
   output = base64.decodestring(s)
else:
   output = s

ideas?

+3  A: 

You could just try it, and see what happens:

import base64

def decode_if_necessary(s):
    try:
         return base64.decodestring(s)
    except:
         return s

But you have to ask yourself: what if the original message was in fact a syntactically valid base64 string, but not meant to be one? Then "decoding" it will succeed, but the result is not the required output. So I have to ask: is this really what you want?

Edit: Note that decodestring is deprecated.

Stephan202
you're saying that if s isn't decoded, decodestring() raises an exception?
Guy
He's saying that the chances of a string you want to use being validly base64 encoded are slim, and when you call `decodestring` on an invalidly base64 encoded string, `decodestring` raises an exception. This looks to me like a reasonable, simple approach. +1
Dominic Rodger
I actually tried something like that and when the string, that was not decoded, did not throw an exception, I got gibrish.
Guy
Then the input you supplied was in fact a valid base64 encoding. This demonstrates the issue at hand.
Stephan202
+10  A: 

In general, it's impossible; if you receive string 'MjMj', for example, how could you possibly know whether it's already decoded and needs to be used as is, or decoded into '23#'?

Alex Martelli
exactly: you can only test that there aren't forbidden chars and that the length is divisible by four.
giorgian
it's also worth noting that because it's impossible, attempting to decode it "when necessary" can be used as an attack vector for XSS and similar attacks by crafting seemingly-encoded data that your system does bad things with after its decoded.
rmeador
Very nice choice of encoded and decoded strings, there.
Will McCutchen
A: 

You could check to see if a string may be base64 encoded. In general, the function can predict with 75%+ accuracy is the data is encoded.

def isBase64(s):
    return (len(s) % 4 == 0) and re.match('^[A-Za-z0-9+/]+[=]{0,2}$', s)
brianegge