views:

53

answers:

2

I have been working with some complex PDF outputs with reportlab. These are generally fine but there are some cases still where I get LayoutErrors - these are usually because Flowables are too big at some point.

It's proving o be pretty hard to debug these as I don't often have more information than something like this;

Flowable <Table@0x104C32290 4 rows x 6 cols> with cell(0,0) containing
'<Paragraph at 0x104df2ea8>Authors'(789.0 x 1176) too large on page 5 in frame 'normal'(801.543307087 x 526.582677165*) of template 'Later'

It's really not that helpful. What I would ideally like to know is the best debugging and testing strategies for this kinda thing.

  • Is there a way I can view a broken PDF? i.e. rendered with the layout errors so I can see whats going on more easily.
  • Is there a way I can add a hook to reportlab to better handle these errors? Rather than just failing the whole PDF?
  • Any other suggestions about generally improving, testing and handling problems like these.

I don't have a particular example so its more general advice, the exception above I have resolved but its kinda through trial and error (read; guessing and seeing what happens).

+1  A: 

We had a problem when using Reportlab to format some content that was originally html and sometimes the html was too complex. The solution (and I take no credit here, this was from the guys at Reportlab) was to catch the error when it occurred and output it directly into the PDF.

That means you get to see the cause of the problem in the right context. You could expand on this to output details of the exception, but in our case since our problem was converting html to rml we just had to display our input:

Tthe preppy template contains this:

{{script}}
#This section contains python functions used within the rml.
#we can import any helper code we need within the template,
#to save passing in hundreds of helper functions at the top
from rml_helpers import blocks
{{endscript}}

and then later bits of template like:

    {{if equip.specification}}
 <condPageBreak height="1in"/> 
        <para style="h2">Item specification</para>
        {{blocks(equip.specification)}}
    {{endif}}

In rml_helpers.py we have:

from xml.sax.saxutils import escape
from rlextra.radxml.html_cleaner import cleanBlocks
from rlextra.radxml.xhtml2rml import xhtml2rml

def q(stuff):
    """Quoting function which works with unicode strings.

    The data from Zope is Unicode objects.  We need to explicitly
    convert to UTF8; then escape any ampersands.  So
       u"Black & Decker drill"
    becomes
       "Black &amp; Decker drill"
    and any special characters (Euro, curly quote etc) end up
    suitable for XML.  For completeness we'll accept 'None'
    objects as well and output an empty string.

    """
    if stuff is None:
        return ''
    elif isinstance(stuff,unicode):
        stuff = escape(stuff.encode('utf8'))
    else:
        stuff = escape(str(stuff))
    return stuff.replace('"','&#34;').replace("'", '&#39;')

def blocks(txt):
    try:
        txt2 = cleanBlocks(txt)
        rml = xhtml2rml(txt2)
        return rml
    except:
        return '<para style="big_warning">Could not process markup</para><para style="normal">%s</para>' % q(txt)

So anything which is too complex for xhtml2rml to handle throws an exception and is replaced in the output by a big warning 'Could not process markup' followed by the markup that caused the error, escaped so it appears as literal.

Then all we have to do is to remember to search the output PDF for the error message and fix up the input accordingly.

Duncan
A: 

Make sure you are not re-using any of your flowable objects (as in, rendering multiple versions of a document using common template parts). This is not supported by ReportLab, and can cause this error.

The reason seems to be that ReportLab will set an attribute on these objects when performing the layout to indicate that it was needed to move them to a separate page. If it has to be moved twice, it will throw that exception. These attributes are not reset when you render a document, so it can appear that an object was moved to a separate page twice when it really wasn't.

I've hacked around this before by resetting the attribute manually (I can't remember the name right now; it was '_deferred' or something), but the correct approach is to toss out any objects you used to render a document after it's rendered.

Bob