Reflowing PDF can be extremely difficult to do, impossible in some cases. Even Adobe confesses to severe limitations with reflowing in its own viewer. This is because PDF, like PostScript (and unlike other formats like Word or HTML), is a page description language.
You will probably only be able to reflow text only, without graphics, and you will only be able to do so in those instances when it is possible for you to extract meaningful text from the PDF (a non-trivial task in itself in the absence of tagging, sometimes virtually impossible.)
Challenges you may encounter with non-tagged PDFs:
- scanned, non-searchable documents, may require you to perform OCR
- letters rendered individually, not as part of strings (you will have a hard time determining whether the PDF actually read
noted
, no ted
, not ed
, n o t e d
, etc.
- multi-column text, inset text boxes etc.
- the mapping between text and font may be obfuscated, i.e. letter
b
may map to the font's A
glyph and will render as A
-- the only way to resolve this mapping would be to OCR the font, or rasterize the PDF and OCR the whole PDF
- etc.