views:

209

answers:

2

Question:

How do I get a byte stream that works like StringIO for Python 2.5?

Application:

I'm converting a PDF to text, but don't want to save a file to the hard disk.

Other Thoughts:

I figured I could use StringIO, but there's no mode parameter (I guess "String" implies text mode).

Apparently the io.BytesIO class is new in v2.6, so that doesn't work for me either.

I've got a solution with the tempfile module, but I'd like to avoid any reads/writes to/from the hard disk.

+2  A: 

In Python 2.x, "string" means "bytes", and "unicode" means "string". You should use the StringIO or cStringIO modules. The mode will depend on which kind of data you pass in as the buffer parameter.

John Millikin
Thanks, I'm feeling a little mentally challenged at the moment. :-P
tgray
Do you know why there is a separate BytesIO in 2.6 if StringIO does the same thing?
tgray
Forward compatibility -- 2.6 is meant to ease the transition to 3.0, so some of the 3.0 features (such as the `io` module) have been backported.
John Millikin
+1  A: 

If you're working with PDF, then StringIO should be fine as long as you pay heed to the docs:

The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care. If both are used, 8-bit strings that cannot be interpreted as 7-bit ASCII (that use the 8th bit) will cause a UnicodeError to be raised when getvalue() is called.

Note this is not true for cStringIO:

Unlike the memory files implemented by the StringIO module, those provided by this module are not able to accept Unicode strings that cannot be encoded as plain ASCII strings.

See the full documentation at:

ars