views:

226

answers:

4

A naive attempt fails miserably:

import hashlib

class fred(hashlib.sha256):
    pass

-> TypeError: Error when calling the metaclass bases
       cannot create 'builtin_function_or_method' instances

Well, it turns out that hashlib.sha256 is a callable, not a class. Trying something a bit more creative doesn't work either:

 import hashlib

 class fred(type(hashlib.sha256())):
     pass

 f = fred

 -> TypeError: cannot create 'fred' instances

Hmmm...

So, how do I do it?

Here is what I want to actually achieve:

class shad_256(sha256):
    """Double SHA - sha256(sha256(data).digest())
Less susceptible to length extension attacks than sha256 alone."""
    def digest(self):
        return sha256(sha256.digest(self)).digest()
    def hexdigest(self):
        return sha256(sha256.digest(self)).hexdigest()

Basically I want everything to pass through except when someone calls for a result I want to insert an extra step of my own. Is there a clever way I can accomplish this with __new__ or metaclass magic of some sort?

I have a solution I'm largely happy with that I posted as an answer, but I'm really interested to see if anybody can think of anything better. Either much less verbose with very little cost in readability or much faster (particularly when calling update) while still being somewhat readable.

Update: I ran some tests:

# test_sha._timehash takes three parameters, the hash object generator to use,
# the number of updates and the size of the updates.

# Built in hashlib.sha256
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(hashlib.sha256, 20000, 512)'
100 loops, best of 3: 104 msec per loop

# My wrapper based approach (see my answer)
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.wrapper_shad_256, 20000, 512)'
100 loops, best of 3: 108 msec per loop

# Glen Maynard's getattr based approach
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.getattr_shad_256, 20000, 512)'
100 loops, best of 3: 103 msec per loop
+6  A: 

Make a new class, derive from object, create a hashlib.sha256 member var in init, then define methods expected of a hash class and proxy to the same methods of the member variable.

Something like:

import hashlib

class MyThing(object):
    def __init__(self):
        self._hasher = hashlib.sha256()

    def digest(self):
        return self._hasher.digest()

And so on for the other methods.

Adam Vandenberg
Well, that would work. I was hoping to do something more clever that involved less typing, because, you know, I'd rather type a whole bunch of text in StackOverflow then a bunch of repetitive method declarations. :-)
Omnifarious
You could override __getattr__ or __getattribute__ and proxy all calls down to the self._hasher variable, I suppose that would be a bit more clever.
Adam Vandenberg
@Adam Vandenberg, more clever but it would actually be much more verbose, because gettattr would have to conditionally apply function wrappers.
mikerobi
I posted what I eventually did as an answer, but if you can come up with something better I'm all ears. It's actually similar to what you wrote here.
Omnifarious
+2  A: 

So, here is the answer I came up with that's based on Glen's answer, which is the one I awarded him the bounty for:

import hashlib

class _double_wrapper(object):
    """This wrapper exists because the various hashes from hashlib are
    factory functions and there is no type that can be derived from.
    So this class simulates deriving from one of these factory
    functions as if it were a class and then implements the 'd'
    version of the hash function which avoids length extension attacks
    by applying H(H(text)) instead of just H(text)."""

    __slots__ = ('_wrappedinstance', '_wrappedfactory', 'update')
    def __init__(self, wrappedfactory, *args):
        self._wrappedfactory = wrappedfactory
        self._assign_instance(wrappedfactory(*args))

    def _assign_instance(self, instance):
        "Assign new wrapped instance and set update method."
        self._wrappedinstance = instance
        self.update = instance.update

    def digest(self):
        "return the current digest value"
        return self._wrappedfactory(self._wrappedinstance.digest()).digest()

    def hexdigest(self):
        "return the current digest as a string of hexadecimal digits"
        return self._wrappedfactory(self._wrappedinstance.digest()).hexdigest()

    def copy(self):
        "return a copy of the current hash object"
        new = self.__class__()
        new._assign_instance(self._wrappedinstance.copy())
        return new

    digest_size = property(lambda self: self._wrappedinstance.digest_size,
                           doc="number of bytes in this hashes output")
    digestsize = digest_size
    block_size = property(lambda self: self._wrappedinstance.block_size,
                          doc="internal block size of hash function")

class shad_256(_double_wrapper):
    """
    Double SHA - sha256(sha256(data))
    Less susceptible to length extension attacks than SHA2_256 alone.

    >>> import binascii
    >>> s = shad_256('hello world')
    >>> s.name
    'shad256'
    >>> int(s.digest_size)
    32
    >>> int(s.block_size)
    64
    >>> s.hexdigest()
    'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
    >>> binascii.hexlify(s.digest()) == s.hexdigest()
    True
    >>> s2 = s.copy()
    >>> s2.digest() == s.digest()
    True
    >>> s2.update("text")
    >>> s2.digest() == s.digest()
    False
    """
    __slots__ = ()
    def __init__(self, *args):
        super(shad_256, self).__init__(hashlib.sha256, *args)
    name = property(lambda self: 'shad256', doc='algorithm name')

This is a little verbose, but results in a class that works very nicely from a documentation perspective and has a relatively clear implementation. With Glen's optimization, update is as fast as it possibly can be.

There is one annoyance, which is that the update function shows up as a data member and doesn't have a docstring. I think that's a readability/efficiency tradeoff that's acceptable.

Omnifarious
This is pretty much write-only, isn't it?
Michael Foukarakis
@Michael Foukarakis - Yes, but it does mean I have very little extra code to write for any other hash function.
Omnifarious
@Michael Foukarakis - I fixed it so it no longer has the 'write-only' property nearly so strongly.
Omnifarious