views:

989

answers:

2

I've had a look around for the answer to this, but I only seem to be able to find software that does it for you. Does anybody know how to go about doing this in python?

+9  A: 

I wrote a piece of python code that verifies the hashes of downloaded files against what's in a .torrent file. Assuming you want to check a download for corruption you may find this useful.

You need the bencode package to use this. Bencode is the serialization format used in .torrent files. It can marshal lists, dictionaries, strings and numbers somewhat like JSON.

The code takes the hashes contained the info['pieces'] string:

torrent_file = open(sys.argv[1], "rb")
metainfo = bencode.bdecode(torrent_file.read())
info = metainfo['info']
pieces = StringIO.StringIO(info['pieces'])

That string contains a succession of 20 byte hashes (one for each piece). These hashes are then compared with the hash of the pieces of on-disk file(s).

The only complicated part of this code is handling multi-file torrents because a single torrent piece can span more than one file (internally BitTorrent treats multi-file downloads as a single contiguous file). I'm using the generator function pieces_generator() to abstract that away.

You may want to read the BitTorrent spec to understand this in more details.

Full code bellow:

import sys, os, hashlib, StringIO, bencode

def pieces_generator(info):
    """Yield pieces from download file(s)."""
    piece_length = info['piece length']
    if 'files' in info: # yield pieces from a multi-file torrent
        piece = ""
        for file_info in info['files']:
            path = os.sep.join([info['name']] + file_info['path'])
            print path
            sfile = open(path.decode('UTF-8'), "rb")
            while True:
                piece += sfile.read(piece_length-len(piece))
                if len(piece) != piece_length:
                    sfile.close()
                    break
                yield piece
                piece = ""
        if piece != "":
            yield piece
    else: # yield pieces from a single file torrent
        path = info['name']
        print path
        sfile = open(path.decode('UTF-8'), "rb")
        while True:
            piece = sfile.read(piece_length)
            if not piece:
                sfile.close()
                return
            yield piece

def corruption_failure():
    """Display error message and exit"""
    print("download corrupted")
    exit(1)

def main():
    # Open torrent file
    torrent_file = open(sys.argv[1], "rb")
    metainfo = bencode.bdecode(torrent_file.read())
    info = metainfo['info']
    pieces = StringIO.StringIO(info['pieces'])
    # Iterate through pieces
    for piece in pieces_generator(info):
        # Compare piece hash with expected hash
        piece_hash = hashlib.sha1(piece).digest()
        if (piece_hash != pieces.read(20)):
            corruption_failure()
    # ensure we've read all pieces 
    if pieces.read():
        corruption_failure()

if __name__ == "__main__":
    main()
Alexandre Jasmin
Don't know if this solved the OP's problem, but it definitely solved mine (once I got past the bencode package's brokenness: http://stackoverflow.com/questions/2693963/importing-bittorrent-bencode-module). Thanks!
Nicholas Knight
I always wanted to have such a tool, and was about to dig into the old official python client to find out how to write one. Thanks!!
netvope
A: 

According to this, you should be able to find the md5sums of files by searching for the part of the data that looks like:

d[...]6:md5sum32:[hash is here][...]e

(SHA is not part of the spec)

Brendan Long
Just search for SHA on the page you linked you'll see it's used extensively. Also quote: `md5sum: (optional) a 32-character hex[...] This is not used by BitTorrent at all, but it is included by some programs`
Alexandre Jasmin
Ah I see, so something like `d[...]9:info_hash[length]:[SHA hash]e`
Brendan Long
@Brendan I'm afraid not. As I mentioned in the question comments there's no SHA1 hash for the files themselves but for each little file pieces. Pieces hash are useful because they can be verified early in the download process. As soon as you have one valid piece you can share it with other peers... That said your md5 solution has the advantage of being simple. It's just not guaranteed to be available in all .torrent files.
Alexandre Jasmin