views:

229

answers:

1

Hi. I have a webapp that export reports in PDF. Everything is fine when the query returns less than 100 values. When the number of records raise above 100 the server raise a 502 Proxy Error. The report outputs fine in HTML. The process that hangs up the server is the conversion from html to PDF. I'm using xhtml2pdf (AKA pisa 3.0) to generate the PDF. The algorythm is something like this:

def view1(request, **someargs):
    queryset = someModel.objects.get(someargs)
    if request.GET['pdf']:
        return pdfWrapper('template.html',queryset,'filename')
    else:
        return render_to_response('template.html',queryset)

def pdfWrapper(template_src, context_dict, filename):
    ################################################
    #
    # The code comented below is an older version
    # I updated the code according the comment recived
    # The function still works for short HTML documents
    # and produce the 502 for larger onese
    #
    ################################################

    ##import cStringIO as StringIO
    import ho.pisa as pisa
    from django.template.loader import get_template
    from django.template import Context
    from django.http import HttpResponse
    ##from cgi import escape

    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)

    response = HttpResponse()
    response['Content-Type'] ='application/pdf'
    response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

    pisa.CreatePDF(
        src=html,
        dest=response,
        show_error_as_pdf=True)

    return response

    ##result = StringIO.StringIO()
    ##pdf = pisa.pisaDocument(
    ##            StringIO.StringIO(html.encode("ISO-8859-1")),
    ##            result)
    ##if not pdf.err:
    ##    response = HttpResponse(
    ##                   result.getvalue(), 
    ##                   mimetype='application/pdf')
    ##    response['Content-Disposition']='attachement; filename=%s.pdf'%(filename)
    ##    return response
    ##return HttpResponse('Hubo un error<pre>%s</pre>' % escape(html))

I've put some thought about creating a buffer so the server can free some memory but I didn't find anything yet. Anyone could help? please?

+1  A: 

I can't tell you exactly what causes your problem - it could be caused by buffering problems in StringIO.

However, you are wrong if you assume that this code would actually stream the generated PDF data: StringIO.getvalue() returns the content of the string buffer at the time this method is called, not an output stream (see http://docs.python.org/library/stringio.html#StringIO.StringIO.getvalue).

If you want to stream the output, you can treat the HttpResponse instance as a file-like object (see http://docs.djangoproject.com/en/1.2/ref/request-response/#usage).

Secondly, I don't see any reason to make use of StringIO here. According to the documentation of Pisa I found (which calls this function CreatePDF, by the way) the source can be a string or a unicode object.

Personally, I would try the following:

  1. Create the HTML as unicode string
  2. Create and configure the HttpResponse object
  3. Call the PDF generator with the string as input and the response as output

In outline, this could look like this:

html = template.render(context)

response = HttpResponse()
response['Content-Type'] ='application/pdf'
response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

pisa.CreatePDF(
    src=html,
    dest=response,
    show_error_as_pdf=True)

#response.flush()
return response

However, I did not try if this actually works. (I did this sort of PDF streaming only in Java, so far.)

Update: I just looked at the implementation of HttpResponse. It implements the file interface by collecting the chunks of strings written to it in a list. Calling response.flush() is pointless, because it does nothing. Also, you can set response parameters like Content-Type even after the response has been accessed as file-object.

Your original problem may also be related to the fact you never closed the StringIO objects. The underlying buffer of a StringIO object is not released before close() is called.

Bernd Petersohn
Yeap, I thought about the close() call. I'll try your suggestions and let you know. Thanks for the effort put in this problem!
Marcoslhc
I took your suggestion, use the CreatePDF() function and throw away all the StringIO stuff. It's a cleaner, simplier solution. However the 502 still appearing. Right now I'm trying writing a file to disk to see if it can do a better memory management.
Marcoslhc
I browsed a little through the pisa sources and found that your original code was already pretty close to the examples given there. So my guess is that the actual problem is more in pisa's PDF generating function. There is a bug report (http://code.google.com/p/xhtml2pdf/issues/detail?id=50) which indicates that the generator may sometimes return too early, before that actual PDF is finished. If this also happens in your case, it may well be the source of the 502 error response. Writing the PDF output to a file, as you suggested above, could possibly solve this problem.
Bernd Petersohn
Thanks! it worked just fine
Marcoslhc