Can you guys list available PDF libraries to manipulate PDF files?
- Is it freeware or open-source?
- What language(s) it is available for?
- What is it good for?
Can you guys list available PDF libraries to manipulate PDF files?
My favorite is iText: http://www.lowagie.com/iText/
It is java based, but there are several ports for the .NET framework one of which can be found here: http://www.ujihara.jp/iTextdotNET/en/
For .NET we use the Open Source PDFSharp. It's worked well for us over the past three years.
From their FAQ:
PDFsharp is a .NET library for creating and modifying Adobe PDF documents programmatically. It is written in C# and can be used from any .NET language like VB.NET.
We use DynamicPDF for creating, merging and manipulating pdfs. They offer libraries for .net and java.
Apple has PDFKit for Cocoa. Only available on Mac OS X and the iPhone OS, but I figured that I'd list it, for completeness sake. You can display, annotate, modify, and even create PDF documents from scratch with it.
I've used Pdftk (PDF toolkit) for several different projects after learning about in the PDF Hacks book by O'Reilly (the book is recommended as well, even though it may be a bit out of date now).
Pdftk is basically a command line tool for manipulating PDFs, but I have used it in both client and server applications by shelling it out as an external process. The AccessPDF site also has a lot of other information on PDF libraries and toolkits, including the libraries that pdftk was built with.
Here is the feature description from the AccessPDF site:
If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: * Merge PDF Documents * Split PDF Pages into a New Document * Decrypt Input as Necessary (Password Required) * Encrypt Output as Desired * Fill PDF Forms with FDF Data and/or Flatten Forms * Apply a Background Watermark * Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels * Update PDF Metadata * Attach Files to PDF Pages or the PDF Document * Unpack PDF Attachments * Burst a PDF Document into Single Pages * Uncompress and Re-Compress Page Streams * Repair Corrupted PDF (Where Possible)
One thing I used pdftk to do was create a Windows form that a user could fill out that then merged the data into a pre-created PDF form and then save the filled out form - something that you can't do using just an PDF reader - you're supposed to need full blown Acrobat for that. That was a few years ago and there are probably many different solutions now, but pdftk is still a useful bag of tricks for manipulating existing PDFs.
Pdftk is open source (GPL'ed) and will run on just about anything.
We use iText and iTextSharp for Java and C# (respectively).
The only issue (and it is minor) is that the documentation for the C# side (which is a port of the Java) is outdated and sparse. I find that the best thing is to keep the java documentation handy, and do a mental lookup (.setFont() [java] == .Font [C#]).
Other than that, it's a really well-thought-out and professional package.
For Python we use ReportLab, and power it with Cheetah, but I have less personal experience with that.
I've used PDF::API2 in Perl for web dynamically creating PDF's in web apps as well as splitting, manipulating, inserting, removing pages. Pretty good for small to mediums size PDF's. PNG handing is slow.
I've also used the official libraries maintained by DataLogics which are uber powerful, but quite pricey. They have C, Java, and .Net APIs. With the C I have done a lot of manipulation like adding images text, searching, forms, etc.
PHP and Ruby have simple and free ones too, I think.
I've used Aspose.Pdf in the past and liked it a lot. It was easy to use and worked well. I used it to generate monthly bank statements for thousands of accounts.
Big Faceless Java PDF Library
Just for the record...
Adobe PDF Library SDK
http://www.adobe.com/devnet/pdf/library/
Does anyone use it?
We use iText. It's an extremely powerful library with a large community. There is also iText in Action available.
Tall Components PDF products. 100% .NET component. Great for C# or VB.Net. I have used the TallPDF.NET component for generating PDFs dynamically. I highly recommend them. Tall Components also has excellent customer service.
They are not free but they do have evaluation versions available for download. Without the license key TallPDF.NET puts a "Evaluation Version" string in the footer.
All of these libraries allow you to manipulate PDF documents.
Our PDFTextStream product is a Java / .NET library for extracting text, metadata, form data, and other bits from PDF documents. It's got a pretty comprehensive feature set, and is extraordinarily easy to integrate into apps.
I used the above mentioned PDFNet SDK from PDFTron because it turned out to be most reliable for purposes of my application (mainly text extraction, and PDF rasterization).
Despite the name, the SDK is available not only as .NET component, but also as a JAVA and C/C++ library on Windows, Linux, Mac OS X. The feature set is impressive, and the support experience was great.
ABCpdf.NET from webSupergoo.
Not open source, but 'free' (as in beer) licenses available.
.NET and COM interfaces enable support for multiple languages. Documentation includes numerous examples in C# and Visual Basic.
Good for MS Windows server environments, or standalone applications. Fully multi-threaded, ABCpdf can be used flexibly from within ASP / ASP.NET. Imports and exports more image formats than you can shake a stick at, HTML, and Office documents too.
We've been using PoDoFo which is a great C++ PDF library. It's open source and free (as in beer).
From their site:
The PoDoFo library is a free, portable C++ library which includes classes to parse PDF files and modify their contents into memory. The changes can be written back to disk easily. The parser can also be used to extract information from a PDF file (for example the parser could be used in a PDF viewer). Besides parsing PoDoFo includes also very simple classes to create your own PDF files.
I've used both Aspose.PDF (for .NET) and ActivePDF and would recommend the Aspose library. I would stay away from ActivePDF.