views:

68

answers:

3

PdfFileReader reads the content from a pdf file to create an object.

I am querying the pdf from a cdn via urllib.urlopen(), this provides me a file like object, which has no seek. PdfFileReader, however uses seek.

What is the simple way to create a PdfFileReader object from a pdf downloaded via url.

Now, what can I do to avoid writing to disk and reading it again via file().

Thanks in advance.

+1  A: 

You could use the .read() method to read in the entire data of the file, and then create your own File-like object (most likely via StringIO) to provide access to it.

Amber
I can't do `file(urllib.urlopen('abc.pdf').read())` either. I get "TypeError: file() argument 1 must be encoded string without NULL bytes, not str"
Lakshman Prasad
`file()` is not a conversion function like `dict()` or `list()` - it actually takes the same arguments as `open()` (a filename, and an optional mode + buffer size). You can't just pass it file contents and get a file object.
Amber
+1  A: 

There isn't really an inexpensive, ready-to-use way to do this. The simplest way is to read all data and put it into a StringIO object. That does, however, require you read everything first, which may or may not be what you want.

If you want something that only reads as necessary, and then stores what was read (or perhaps just a portion of what was read) then you will have to write it yourself. You may want to see the source for the StringIO module (or the io module, in Python 2.6) for some examples.

Thomas Wouters
No, No, No. It isn't a 100 MB file. I just want something that works. If possible, inexpensively. Not by writing an IO module. :)
Lakshman Prasad
So use the first suggestion: StringIO is your friend.
Thomas Wouters
+1  A: 

I suspect you may be optimising prematurely here.

Most modern systems will cache files in memory for a significant period of time before they flush them to disk, so if you write the data to a temporary file, read it back in, then close and delete the file you may find that there's no significant disc traffic (unless it really is 100MB).

You might want to look at using tempfile.TemporaryFile() which creates a temporary file that is automatically deleted when closed, or else tempfile.SpooledTemporaryFile() which explicitly holds it all in memory until it exceeds a particular size.

Duncan

related questions