ansaurus

Question

Optimize PDF conversion in Django / Python

Answer 1

+1 A:

I can't tell you exactly what causes your problem - it could be caused by buffering problems in StringIO.

However, you are wrong if you assume that this code would actually stream the generated PDF data: StringIO.getvalue() returns the content of the string buffer at the time this method is called, not an output stream (see http://docs.python.org/library/stringio.html#StringIO.StringIO.getvalue).

If you want to stream the output, you can treat the HttpResponse instance as a file-like object (see http://docs.djangoproject.com/en/1.2/ref/request-response/#usage).

Secondly, I don't see any reason to make use of StringIO here. According to the documentation of Pisa I found (which calls this function CreatePDF, by the way) the source can be a string or a unicode object.

Personally, I would try the following:

Create the HTML as unicode string
Create and configure the HttpResponse object
Call the PDF generator with the string as input and the response as output

In outline, this could look like this:

html = template.render(context)

response = HttpResponse()
response['Content-Type'] ='application/pdf'
response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

pisa.CreatePDF(
    src=html,
    dest=response,
    show_error_as_pdf=True)

#response.flush()
return response

However, I did not try if this actually works. (I did this sort of PDF streaming only in Java, so far.)

Update: I just looked at the implementation of HttpResponse. It implements the file interface by collecting the chunks of strings written to it in a list. Calling response.flush() is pointless, because it does nothing. Also, you can set response parameters like Content-Type even after the response has been accessed as file-object.

Your original problem may also be related to the fact you never closed the StringIO objects. The underlying buffer of a StringIO object is not released before close() is called.

Bernd Petersohn 2010-08-03 23:28:14

Yeap, I thought about the close() call. I'll try your suggestions and let you know. Thanks for the effort put in this problem!

Marcoslhc 2010-08-04 11:25:33

I took your suggestion, use the CreatePDF() function and throw away all the StringIO stuff. It's a cleaner, simplier solution. However the 502 still appearing. Right now I'm trying writing a file to disk to see if it can do a better memory management.

Marcoslhc 2010-08-04 12:42:19

I browsed a little through the pisa sources and found that your original code was already pretty close to the examples given there. So my guess is that the actual problem is more in pisa's PDF generating function. There is a bug report (http://code.google.com/p/xhtml2pdf/issues/detail?id=50) which indicates that the generator may sometimes return too early, before that actual PDF is finished. If this also happens in your case, it may well be the source of the 502 error response. Writing the PDF output to a file, as you suggested above, could possibly solve this problem.

Bernd Petersohn 2010-08-04 20:15:35

Thanks! it worked just fine

Marcoslhc 2010-08-13 19:26:35

ansaurus

tags:

views:

answers:

Optimize PDF conversion in Django / Python

related questions