views:

3197

answers:

24
+27  Q: 

PDF Libraries

Can you guys list available PDF libraries to manipulate PDF files?

  • Is it freeware or open-source?
  • What language(s) it is available for?
  • What is it good for?
+17  A: 

My favorite is iText: http://www.lowagie.com/iText/

It is java based, but there are several ports for the .NET framework one of which can be found here: http://www.ujihara.jp/iTextdotNET/en/

Anson Smith
iText and iTextSharp can no longer be used in commercial applications -- they've changed the license to AGPL, so any applications that you use iText with have to be available under the same license. If you want to use either of these libraries in a commercial application then visit iTextSoftware.com and buy a license (you have to contact them for a price).
Rowan
+3  A: 

For .NET we use the Open Source PDFSharp. It's worked well for us over the past three years.

From their FAQ:

PDFsharp is a .NET library for creating and modifying Adobe PDF documents programmatically. It is written in C# and can be used from any .NET language like VB.NET.

Scott Saad
I'll second this suggestion.
Boydski
+2  A: 

ReportLab for Python.

  • Open Source for creating PDFs.
  • More available commercially.
iny
it creates PDF, but can't 'manipulate' them.
Javier
+1  A: 

We use DynamicPDF for creating, merging and manipulating pdfs. They offer libraries for .net and java.

jinsungy
+1  A: 

Apple has PDFKit for Cocoa. Only available on Mac OS X and the iPhone OS, but I figured that I'd list it, for completeness sake. You can display, annotate, modify, and even create PDF documents from scratch with it.

Mark Bessey
Correction: as of May 2010, PDFKit is not available on iPhone OS.
Ole Begemann
+6  A: 

I've used Pdftk (PDF toolkit) for several different projects after learning about in the PDF Hacks book by O'Reilly (the book is recommended as well, even though it may be a bit out of date now).

Pdftk is basically a command line tool for manipulating PDFs, but I have used it in both client and server applications by shelling it out as an external process. The AccessPDF site also has a lot of other information on PDF libraries and toolkits, including the libraries that pdftk was built with.

Here is the feature description from the AccessPDF site:

 If PDF is electronic paper, then pdftk is an electronic staple-remover,
 hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a 
 command-line tool for doing everyday things with PDF documents. Keep 
 one in the top drawer of your desktop and use it to:

    * Merge PDF Documents
    * Split PDF Pages into a New Document
    * Decrypt Input as Necessary (Password Required)
    * Encrypt Output as Desired
    * Fill PDF Forms with FDF Data and/or Flatten Forms
    * Apply a Background Watermark
    * Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
    * Update PDF Metadata
    * Attach Files to PDF Pages or the PDF Document
    * Unpack PDF Attachments
    * Burst a PDF Document into Single Pages
    * Uncompress and Re-Compress Page Streams
    * Repair Corrupted PDF (Where Possible)

One thing I used pdftk to do was create a Windows form that a user could fill out that then merged the data into a pre-created PDF form and then save the filled out form - something that you can't do using just an PDF reader - you're supposed to need full blown Acrobat for that. That was a few years ago and there are probably many different solutions now, but pdftk is still a useful bag of tricks for manipulating existing PDFs.

Pdftk is open source (GPL'ed) and will run on just about anything.

CMPalmer
Pdftk uses iText and the license for iText has changed to AGPLv3. This means that any software built using iText has to offer the same license and cannot be used in commercial applications. The change to the license was introduced in version 5 of iText. So if you're using an older version, you are OK. But if you're using a newer version... then do some checking to see if you're still OK.
Rowan
+4  A: 

We use iText and iTextSharp for Java and C# (respectively).

The only issue (and it is minor) is that the documentation for the C# side (which is a port of the Java) is outdated and sparse. I find that the best thing is to keep the java documentation handy, and do a mental lookup (.setFont() [java] == .Font [C#]).

Other than that, it's a really well-thought-out and professional package.

For Python we use ReportLab, and power it with Cheetah, but I have less personal experience with that.

jm
We also use iTextSharp, but I think it's *really* confusing with 0,0 being *lower* left instead of *top*! There's a bunch of other stuff I don't like as well, but we made a abstraction layer on top of it and now mostly everything is superb. (Though I've missed the possibility to add vector images, though I might just have missed something.)
svinto
The iText license has been changed to AGPL, so you will no longer be able to use iText in commercial applications unless you purchase a license from iTextSoftware.com.
Rowan
+3  A: 

I've used PDF::API2 in Perl for web dynamically creating PDF's in web apps as well as splitting, manipulating, inserting, removing pages. Pretty good for small to mediums size PDF's. PNG handing is slow.

I've also used the official libraries maintained by DataLogics which are uber powerful, but quite pricey. They have C, Java, and .Net APIs. With the C I have done a lot of manipulation like adding images text, searching, forms, etc.

PHP and Ruby have simple and free ones too, I think.

Chris Kloberdanz
+1  A: 

The PDFBox and iText Libraries are FREE!

iText Sample HERE

PDFBox Sample HERE

-MarlonRibunal

MarlonRibunal
PDFBox can go deeper than iTextSharp.
kenny
A: 

I've used Aspose.Pdf in the past and liked it a lot. It was easy to use and worked well. I used it to generate monthly bank statements for thousands of accounts.

  • It's not free
  • You can supply it XML that defines your layout and content and it spits out a PDF byte stream.
  • Available for .Net and Java
Kon
A: 

Big Faceless Java PDF Library

  • It's not free
  • Available for Java
  • We use it mainly to create PDF's from PDF Forms but it has many features
John Nilsson
A: 

Just for the record...

Adobe PDF Library SDK

http://www.adobe.com/devnet/pdf/library/

Does anyone use it?

Daniel Silveira
A: 

We use iText. It's an extremely powerful library with a large community. There is also iText in Action available.

spdaly
A: 

Gnostice has a commercial library that support several languages including Delphi, C++ and .NET.

Khadaji
A: 

TCPDF.... godd for PHP

Jasim
+3  A: 

Tall Components PDF products. 100% .NET component. Great for C# or VB.Net. I have used the TallPDF.NET component for generating PDFs dynamically. I highly recommend them. Tall Components also has excellent customer service.

They are not free but they do have evaluation versions available for download. Without the license key TallPDF.NET puts a "Evaluation Version" string in the footer.

Jeff Widmer
TallComponents also has PDFKit for manipulating existing PDF's and a host of solutions for displaying PDF. All 100% .NET.
Marnix van Valen
We have used TallComponents PDFControls for both generating, display and printing (rasterizing) both Fillable and Non fillable files. We specifically went with them in order to have fillable PDF reader control embedded inside our winforms application.
DevByDefault
+4  A: 

All of these libraries allow you to manipulate PDF documents.

Rowan
A: 

Our PDFTextStream product is a Java / .NET library for extracting text, metadata, form data, and other bits from PDF documents. It's got a pretty comprehensive feature set, and is extraordinarily easy to integrate into apps.

Chas Emerick
+1  A: 

I used the above mentioned PDFNet SDK from PDFTron because it turned out to be most reliable for purposes of my application (mainly text extraction, and PDF rasterization).

Despite the name, the SDK is available not only as .NET component, but also as a JAVA and C/C++ library on Windows, Linux, Mac OS X. The feature set is impressive, and the support experience was great.

Lary
Larry, i don't know what the hell yall talking about but +1!
Eric
+1  A: 

ABCpdf.NET from webSupergoo.

  • Not open source, but 'free' (as in beer) licenses available.

  • .NET and COM interfaces enable support for multiple languages. Documentation includes numerous examples in C# and Visual Basic.

  • Good for MS Windows server environments, or standalone applications. Fully multi-threaded, ABCpdf can be used flexibly from within ASP / ASP.NET. Imports and exports more image formats than you can shake a stick at, HTML, and Office documents too.

ABCpdf Feature Chart...

AffineMesh94464
A: 

We've been using PoDoFo which is a great C++ PDF library. It's open source and free (as in beer).

From their site:

The PoDoFo library is a free, portable C++ library which includes classes to parse PDF files and modify their contents into memory. The changes can be written back to disk easily. The parser can also be used to extract information from a PDF file (for example the parser could be used in a PDF viewer). Besides parsing PoDoFo includes also very simple classes to create your own PDF files.

Michael Marsella
A: 

FPDF is a great library for generating PDF files with PHP.

George Edison
+1  A: 

I've used both Aspose.PDF (for .NET) and ActivePDF and would recommend the Aspose library. I would stay away from ActivePDF.

Vermiscious