pdf

Ghostscript: Spliting large PDF causes a "pdfmark destination page" error

I am trying to split a PDF into 2 smaller PDF's using gs (Ghostscript version 8.62 on Debian Lenny). I only have Debian Linux on hand, so please don't offer Windows or Mac solutions. When specifying -dLastPage=740, I receive the error: GPL Ghostscript 8.62: ERROR: A pdfmark destination page 1203 points beyond the last page 740. I ha...

Is it possible to "print a PDF to a file" so that the file contains plain text of the content?

We have a 96 page PDF file and we would like to have a text file containing all the text in that file. Is there a way to somehow print the PDF to a file so that file contains only the text of the PDF? ...

Should DocBook be used for publishing technical documentation in English & Arabic?

I'm looking for the ideal tool to use for publishing technical documentation in English & Arabic (in the same document). Should I use DocBook, or is it better to stick with TeX/LaTeX? I am a complete beginner to both systems so there's no legacy stuff to worry about. The two most import factors for me are easy of use and support for Arab...

List attributes of an XFA Object using Javascript in a PDF

I'm trying to create a PDF document with several text fields that can grow in height up to some maximum value. Due to the constraints of the project, I'm using Adobe Designer 7, which happily allows Javascript. However, the objects in XFA are a little different from the HTML DOM or earlier PDF DOMs. So, I know for certain that my field...

PDF Search and Replace in C#

Hi, I want to perform a simple (ideally RegEx) search and replace over a large number of PDF documents in a WinForms application. I've got as far as using ITextSharp to read and tokenise existing documents, from which I can search for the text. The problem is that it doesn't seem to support generating new document from these tokens (onl...

Stream a PDF to a web page failing

I have a URL to a PDF and I want to serve up the PDF to my page viewer. I can successfully (I think) retrieve the PDF file. Then when I do the Response.BinaryWrite() I get a "The file is damaged and could not be repaired" error from the adobe reader. Here's the code I have: protected void Page_Load(object sender, EventArgs e) { ...

Delphi PDF generation

We're using Fast Reports to create reports but we're not very happy with the quality of the PDFs it creates. I know we can plug in other PDF components instead of the one that comes with FastReports so my question is What good PDF components are there out there (Free or Commercial) for Delphi? Ideally it should not require any dlls. Ed...

Extracting tables from PDF files?

Anyone got any experience with extracting data from PDF files programatically, in particular embedded tables? What tools did you use? Is this always a one-off process depending on the file, or are there tools which will work against all sorts of different files? ...

Get MySQL columns width ?

Hello. I was wondering if I can somehow convert a column header text form MySQL into actial width size pixels. I am trying to generate a PDF from the database and I want it to automatically adjust column widths. As I will use it for many tables, the width must differ so I should be able to see it like: "The header for this column is call...

split a multi-page pdf file into multiple pdf files with python?

I'd like to take a multi-page pdf file and create separate pdf files per page. I've downloaded reportlab and have browsed the documentation, but it seems aimed at pdf generation, I haven't yet seen anything about processing pdf's themselves. Is there an easy way to do this in python? ...

How can I extract images from a PDF file?

Hi! I am able to extract the images from a PDF file using many Perl modules, but none of them specifies the exact positions of the images being extracted (where the image actually belongs). Could anyone suggest to me how to extract the images along with their positions? Thanks in advance. ...

How do I display office and/or pdf content on a windows form?

We have an application in which admin members can add content for their subordinates to view. Their requirement is that it should be able to display word, excel, powerpoint and pdf documents in a non-editable manner. The one option that I found for doing this is to have the content loaded into a web browser component. The downside to ...

iText page wrapping- changes order of elements

I'm using iText to generate PDF reports - came across this issue, and worked up a simple example to illustrate it. I'm combining simple paragraphs, and images. The height of the images is such that 3 will fit on a PDF page, but when if text is on a page, only 2 images will fit. I create my iText PDF like so Document document = new ...

Performing Optical Character Recognition on PDF's from Coldfusion using a Java or .NET Library?

I am looking to take a PDF and extract any text from it. I then want to make it available using Coldfusion's available Verity search to search the contents. Are there any libraries out there that do this quite well already? I am including Java or .NET (Java prefered) libraries in the scope since they can be called from CF. Any insigh...

PHP library to read PDFs ?

Do you know of any free libraries to read PDFs in PHP, the built in PDF functionality is only for rendering PDF output. ...

How to vertically align Paragraphs within a Table using Reportlab?

I'm using Reportlab to generate report cards. The report cards are basically one big Table object. Some of the content in the table cells needs to wrap, specifically titles and comments, and I also need to bold certain elements. To accomplish both the wrapping and ability to bold, I'm using Paragraph objects within the Table. My tabl...

overlay one pdf or ps file on top of another

I have two pdf or postscript files (I can work with either one). What I want to do is merge each page on top of the other so that page1 of document A will be combined with page 1 of document B to produce page 1 of the output document. This isn't something I necessarily want need to do programatically, although that would be helpful. A...

Weird problem with Rails and Adobe Livecycle pdf

I'm building a site in which a catalog (of products) is shown and when clicked a Adobe Livecycle generated pdf will be opened. There are a few form fields and when the submit button is pressed, the fields will be submitted to the url http://localhost:3000/pdf-parser. This file gets all the parameters and can store them in the database. ...

How to use jQuery $.ajax to open a pdf w/out a post-back?

I have a very dynamic UI that provides the end user 2 detail views (each one a step deeper than the previous). When i get to the bottom of this chain I would like the ability to "preview" a pdf file ... but the only luck I have thus far is using window.open(url) and this pops up an additional browser window in IE6/7 (not desired if at a...

Fast PDF splitter library

pyPdf is a great library to split, merge PDF files. I'm using it to split pdf documents into 1 page documents. pyPdf is pure python and spends quite a lot of time in the _sweepIndirectReferences() method of the PdfFileWriter object when saving the extracted page. I need something with better performance. I've tried using multi-threading...