ansaurus

Question

Answer 1

+1 A:

bencode.py from the original Mainline BitTorrent 5.x client (http://download.bittorrent.com/dl/BitTorrent-5.2.2.tar.gz) would give you pretty much the reference implementation in Python.

It has an import dependency on the BTL package but that's trivially easy to remove. You'd then look at bencode.bdecode(filecontent)['info']['files'].

bobince 2009-01-02 13:00:33

this only give the ability to bencode and bdecode strings though, right?But no knowledge of where the bencoded fileset strings actually start and end. i.e after the bencoded metadata and before the binary block

Cheekysoft 2009-01-02 13:11:28

The root and info objects are both dictionaries (mappings). There's no inherent ordering of the file metadata and the binary checksum strings, except that by convention dictionaries are output in key name order. You need not concern yourself with storage order, just suck the whole dictionary in.

bobince 2009-01-02 14:32:38

Answer 2

+2 A:

I would use rasterbar's libtorrent which is a small and fast C++ library.
To iterate over the files you could use the torrent_info class (begin_files(), end_files()).

There's also a python interface for libtorrent.

bene 2009-01-02 13:42:51

Answer 3

+7 A:

Effbot has your question asnwered. Here is the complete code to read the list of files from .torrent file (Python 2.4+):

import re

def tokenize(text, match=re.compile("([idel])|(\d+):|(-?\d+)").match):
    i = 0
    while i < len(text):
        m = match(text, i)
        s = m.group(m.lastindex)
        i = m.end()
        if m.lastindex == 2:
            yield "s"
            yield text[i:i+int(s)]
            i = i + int(s)
        else:
            yield s

def decode_item(next, token):
    if token == "i":
        # integer: "i" value "e"
        data = int(next())
        if next() != "e":
            raise ValueError
    elif token == "s":
        # string: "s" value (virtual tokens)
        data = next()
    elif token == "l" or token == "d":
        # container: "l" (or "d") values "e"
        data = []
        tok = next()
        while tok != "e":
            data.append(decode_item(next, tok))
            tok = next()
        if token == "d":
            data = dict(zip(data[0::2], data[1::2]))
    else:
        raise ValueError
    return data

def decode(text):
    try:
        src = tokenize(text)
        data = decode_item(src.next, src.next())
        for token in src: # look for more tokens
            raise SyntaxError("trailing junk")
    except (AttributeError, ValueError, StopIteration):
        raise SyntaxError("syntax error")
    return data

if __name__ == "__main__":
    data = open("test.torrent", "rb").read()
    torrent = decode(data)
    for file in torrent["info"]["files"]:
        print "%r - %d bytes" % ("/".join(file["path"]), file["length"])

Constantin 2009-02-14 14:09:51

ansaurus

tags:

views:

answers:

Reading the fileset from a torrent

related questions