ansaurus

Question

How to separate content from a file that is a container for binary and other forms of content

Answer 1

+2 A:

You definitely need to be reading in binary mode if the content includes JPEG images.

As well, Python includes an SGML parser, http://docs.python.org/library/sgmllib.html .

There is no example there, but all you need to do is setup do_ methods to handle the sgml tags you wish.

Joe Koberg 2009-05-04 21:39:13

Answer 2

A:

You need to open(filename,'rb') to open the file in binary mode. Be aware that this will cause python to give You confusing, two-byte line endings on some operating systems.

Reef 2009-05-04 22:27:41

Answer 3

+3 A:

What you're looking at isn't "binary", it's uuencoded. Python's standard library includes the module uu, to handle uuencoded data.

The module uu requires the use of temporary files for encoding and decoding. You can accomplish this without resorting to temporary files by using Python's codecs module like this:

import codecs

data       = "Let's just pretend that this is binary data, ok?"
uuencode   = codecs.getencoder("uu")
data_uu, n = uuencode(data)
uudecode   = codecs.getdecoder("uu")
decoded, m = uudecode(data_uu)

print """* The initial input:
%(data)s
* Encoding these %(n)d bytes produces:
%(data_uu)s
* When we decode these %(m)d bytes, we get the original data back:
%(decoded)s""" % globals()

bendin 2009-05-05 18:27:56

After scanning the uuencode stuff I can see that this information is going to be a big help. Thanks. I have been assuming that the block was ready to be snipped and saved.

PyNEwbie 2009-05-05 18:42:53

Okay I am back on the case and let me tell you this was sweet. Also, you gave me just enough to better understand the documentation better. I would have marked it up again but I don't think that is what would have happened.

PyNEwbie 2009-05-07 23:22:52

ansaurus

tags:

views:

answers:

How to separate content from a file that is a container for binary and other forms of content

related questions