For a project of mine, I'm trying to implement a small part of the BitTorrent protocol, which can be found here. Specifically, I want to use the "Bencoding" part of it, which is a way to safely encode data for transfer over a socket. The format is as follows:
8:a string => "a string"
i1234e => 1234
l1:a1:be => ['a', 'b']
d1:a1:b3:one3:twoe => {'a':'b', 'one':two}
The encoding part was easy enough, but decoding is become quite a hassle. For example, if I have a list of strings, I have no way to separate them into individual strings. I've tried several different solutions, including PyParsing and a custom token parser. I'm currently attempting to use regexes, and it seems to be going fairly well, but I'm still hung up on the string problem. My current regex is:
(?P<length>\d+):(?P<contents>.{\1})
However, i can't seem to use the first group as the length of the second group. Is there any good way to do this? Or am I approaching this all wrong, and the answer is sitting right in front of me?