views:

2541

answers:

3

As a self-taught python hobbyist, how would I go about learning to import and export binary files using standard formats?

I'd like to implement a script that takes ePub ebooks (XHTML + CSS in a zip) and converts it to a mobipocket (Palmdoc) format in order to allow the Amazon Kindle to read it (as part of a larger project that I'm working on).

There is already an awesome open-source project for managing ebook libraries : Calibre. I wanted to try implementing this on my own as a learning/self-teaching exercise. I started looking at their python source code and realized that I have no idea what is going on. Of course, the big danger in being self-taught at anything is not knowing what you don't know.

In this case, I know that I don't know much about these binary files and how to work with them in python code (struct?). But I think I'm probably missing a lot of knowledge about binary files in general and I'd like some help understanding how to work with them. Here is a detailed overview of the mobi/palmdoc headers. Thanks!

Edit: No question, good point! Do you have any tips on how to gain a basic knowledge of working with binary files? Python-specific would be helpful but other approaches could also be useful.

TOM:Edited as question, added intro / better title

+7  A: 

You should probably start with the struct module, as you pointed to in your question, and of course, open the file as a binary.

Basically you just start at the beginning of the file and pick it apart piece by piece. It's a hassle, but not a huge problem. If the files are compressed or encrypted, things can get more difficult. It's helpful if you start with a file that you know the contents of so you're not guessing all the time.

Try it a bit, and maybe you'll evolve more specific questions.

tom10
A: 

For teaching yourself python tools that work with binary files, this will get you going. Fun too. Exercises with binaries, zips, images... lots more.

John Pirie
+1  A: 

If you want to construct and analyse binary files the struct module will give you the basic tools, but it isn't very friendly, especially if you want to look at things that aren't a whole number of bytes.

There are a few modules that can help, such as BitVector, bitarray and bitstring. (I favour bitstring, but I wrote it and so may be biased).

For parsing binary formats the hachoir module is very good, but I suspect it's too high-level for your current needs.

Scott Griffiths