How can I extract the part of this stream (the one named BLABLABLA) from the pdf file which contains it??
<</Contents 583 0 R/CropBox[0 0 595.22 842]/MediaBox[0 0 595.22 842]/Parent 29 0 /Resources<</ColorSpace<</CS0 563 0 R>>/ExtGState<</GS0 568 0 R>>/Font<</TT0 559 0 R/TT1 560 0 R/TT2 561 0 R/TT3 562 0 R>>/ProcSet[/PDF/Text/ImageC]/P...
Following this example, I can list all elements into a pdf file
import pyPdf
pdf = pyPdf.PdfFileReader(open("pdffile.pdf"))
list(pdf.pages) # Process all the objects.
print pdf.resolvedObjects
now, I need to extract a non-standard object from the pdf file.
My object is the one named MYOBJECT and it is a string.
The piece printed by...
I can read xmp metadatas through pyPdf with this code:
a = pyPdf.PdfFileReader(open(self.fileName))
b = a.getXmpMetadata()
c = b.pdf_keywords
but: is this the best way?
And if I don't use the pdf_keywords property?
And is there any way to set these metadatas with pyPdf?
...
pyPdf is a great library to split, merge PDF files.
I'm using it to split pdf documents into 1 page documents. pyPdf is pure python and spends quite a lot of time in the _sweepIndirectReferences() method of the PdfFileWriter object when saving the extracted page. I need something with better performance. I've tried using multi-threading...
currently, if I make a page object of a pdf page with pyPdf, and extractText(), what happens is that lines are concatenated together. For example, if line 1 of the page says "hello" and line 2 says "world" the resulting text returned from extractText() is "helloworld" instead of "hello world." Does anyone know how to fix this, or have su...
i would like to use pyPdf to split a pdf file based on the outline where each destination in the outline refers to a different page within the pdf.
example outline:
main --> points to page 1
sect1 --> points to page 1
sect2 --> points to page 15
sect3 --> points to page 22
it is easy within pyPdf to iterate ove...
On an Ubuntu server, I want to create pdfs which include other static pdfs. I have tried using ReportLab with pyPdf. Ideally I would use ReportLab to do the whole thing, but in order to import the pdfs requires their PageCatcher which has a large recurring fee.
So I use pyPdf to merge a page created with ReportLab and my other pdfs. Th...
I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just a...
I need to programmatically add additional graphical elements onto an existing, static PDF book cover. Right now I use pycairo to draw onto a transparent PDFSurface, then merge it into the existing static PDF using pyPdf. This way, the PDFSurface works as an overlay.
However, the transparent PDF is exactly the same size as the static PDF...
I want to programmatically edit a PDF using pyPDF. Currently, I'm struggling with interpreting the various PDF boxes' (TrimBox, MediaBox etc.) dimensions. Each box has four dimensions stored as a four-tuple, e.g.:
TrimBox: 56.69 56.69 1040.31 751.18
According to the PDF specification, these are supposed to describe a r...
Hello ! I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w?
If answer positive, a piece of code would be appreciated.
Thanks
...
I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create.
It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either. So I thought I'd trace the path of me ...
I have written a Pdf merger which merges an original file with a watermark.
What I want to do now is to open 'document-output.pdf' file in the browser by a Django view. I already checked Django's related articles, but since my approach is relatively different, I don't directly create the PDF object, using the response object as its "fil...
Using pypdf python module how to read the following pdf file http://www.envis-icpe.com/pointcounterpointbook/Hindi_Book.pdf
# -*- coding: utf-8 -*-
from pyPdf import PdfFileWriter, PdfFileReader
import pyPdf
def getPDFContent(path):
content = ""
# Load PDF into pyPDF
pdf = pyPdf.PdfFileReader(file(path, "rb"))
# Iterate pag...
I have a program in Python (using pyPDF) that merges a bunch of different PDF documents. Sometimes, the resulting pdf is fine, except for some blank pages in the middle. When I view these documents with Acrobat Reader, I get an error message saying "insufficient data for image". When I view the documents with FoxIT Reader, I get some ...