Converting from PDF to PDF/A
This is the answer to your question as originally phrased.
For a solution that does not involve potentially lossy re-rendering, take a look at http://www.opensubscriber.com/message/[email protected]/8027900.html , it appears that Foris Zoltan was able to get something (not exhaustive, but possibly sufficient for most PDFs) going using iText without the overkill of re-rendering.
If Zoltan's solution is not acceptable/sufficient according to your requirements then you are stuck with re-rendering. You could stick with OpenOffice/JODConverter, or go for less overhead by preferably using GhostScript (the mother of them all), piping pdf2ps
back into PDF/A-enabled ps2pdf
.
Apache FOP
Other respondents have suggested Apache FOP, which in the context of PDF to PDF/A conversion has the following advantages and disadvantages:
- advantage: less "moving parts" than an OpenOffice/JODCOnverter combination (e.g. comparing in-process FOP with daemonized OO)
- disadvantage: you are responsible for converting from PDF to XSL-FO or otherwise rendering to FOP (more coding and/or integration work required of you), whereas OpenOffice/JODCOnverter and Ghostscript can require less additional coding.
However, if I am not mistaken, it appears that you are using PDF as an intermediate format, i.e. that what you are trying to achieve is XHTML to PDF to PDF/A conversion. By converting directly from XHTML to PDF/A the process will be faster, will use less resources (e.g. memory) and will not needlessly degrade output quality (as re-rendering solutions can) or require intimate knowledge of the PDF format (as Zoltan's solution does.)
In this case, directly converting from XHTML to PDF/A would be an ideal solution, either using iText directly (the example uses iTextSharp, a .Net port of iText, but it's the same for Java), or by using Apache FOP as others have suggested (which also uses iText internally when outputting to PDF, and although it is more bloated, inefficient and complicated to setup than using iText directly, it might produce better results than the iText example -- only one way to settle that, i.e. you have to try it out on a few of your XHTML files as samples. :) )