ansaurus

Question

How can I identify unfilled ovals in a PDF document using CAM::PDF?

Answer 1

+1 A:

The $doc->traverse($dereference, $node, $callbackfunc, $callbackdata) seems pretty promising. Check and see what's the oval's type.

Geo 2009-10-19 13:35:57

Answer 2

+1 A:

Looking at the PDF Specs, I would say you have quite challenge in front of you:

PDF provides five types of graphics objects:

A path object is an arbitrary shape made up of straight lines, rectangles, and cubic Bézier curves. A path may intersect itself and may have disconnected sections and holes. A path object ends with one or more painting operators that specify whether the path shall be stroked, filled, used as a clipping boundary, or some combination of these operations.

A text object ...

An external object (XObject) is an object defined outside the content stream and referenced as a named resource (see 7.8.3, "Resource Dictionaries"). The interpretation of an XObject depends on its type. ...

An inline image object uses a special syntax to express the data for a small image directly within the content stream.

A shading object describes a geometric shape whose colour is an arbitrary function of position within the shape.

Therefore, at a minimum, one would need to know whether the ovals you are interested in are paths or external objects or inline image objects or shading objects.

Then, you need an appropriate algorithm which can decide whether an object of that type is an oval. Then, you need to figure out what unfilled means. Then, you need to figure out how to fill them.

It seems unlikely to me that anyone would put in that much effort to give you a ready-made solution.

Sinan Ünür 2009-10-19 13:43:35

Answer 3

A:

It may actually be simpler to render the PDF to a grayscale bitmap and use simple shape recognition to determine filled from unfilled ovals. If you can reliably determine where the ovals are going to be (I'm assuming this is coming from a form, so the position of the ovals would be standard), you can make a simple heuristic (e.g. if 70% of pixels are 50% gray or higher) to determine what kind of oval it is.

For example in this situation:

[ ]        [ ]         [ ]       [X]

[ ]        [X]         [ ]       [ ]

[ ]        [ ]         [X]       [ ]

You can split the ovals using a grid:

[ ]   |    [ ]    |    [ ]   |   [X]
------+-----------+----------+------
[ ]   |    [X]    |    [ ]   |   [ ]
------+-----------+----------+------
[ ]   |    [ ]    |    [X]   |   [ ]

Then from there you just loop over the grid, applying that simple heuristic to each cell.

lost-theory 2009-10-19 14:40:41

ansaurus

tags:

views:

answers:

How can I identify unfilled ovals in a PDF document using CAM::PDF?

related questions