tags:

views:

144

answers:

2

Is there a way of removing the text from a pdf file using c#?

+3  A: 

Yes, using the open source project iTextSharp

Creating a basic PDF file:

You will need to create a new PDF, open the original. Iterate through all the objects you find, remove the text and add the old objects to the new file. The icky part is that after you remove the text, you will have to reposition objects in the pages following the deleted text.

If you do happen to do it, you got yourself a very interesting blog post...

Am
I looked into it and got a solution for extracting the text to a text file. But is there a way to completely remove the text from the actual PDF file?
loyalpenguin
you want to delete some text from the PDF?
Am
Yes, I want to be able to remove the text from the pdf file.
loyalpenguin
Presumably you could use iTextSharp to remove the text from the PDF, edit it, then create a new PDF with the modified string.
Glenn Condron
A: 

There are several libraries, free and commercial that can assist. I'm most familiar with pdfNet by pdfTron. However, I've only used it in a read-only context

I assume it will work for you as "...Add/remove/edit images, text, and vector graphics..." is one of the uses they claim its capable of.

Below is a link to their documentation online. It's a rather detailed API so be prepared to read.

http://www.pdftron.com/pdfnet/html/main.html

As for other vendors, I know Adobe has a reseller that licenses their API in a C# form. I don't recall the product name off the top of my head. If memory serves me correctly it's pricey by comparison to pdfNet and it resembles an old style c (not even C++) method of programming. It won't be a comfortable fit if you're only used to C#.

Jason D