Is there a way of removing the text from a pdf file using c#?
Yes, using the open source project iTextSharp
Creating a basic PDF file:
- http://www.devshed.com/c/a/Java/Creating-Simple-PDF-Files-With-iTextSharp/
- http://www.developerfusion.com/code/5682/create-pdf-files-on-fly-in-c/
You will need to create a new PDF, open the original. Iterate through all the objects you find, remove the text and add the old objects to the new file. The icky part is that after you remove the text, you will have to reposition objects in the pages following the deleted text.
If you do happen to do it, you got yourself a very interesting blog post...
There are several libraries, free and commercial that can assist. I'm most familiar with pdfNet by pdfTron. However, I've only used it in a read-only context
I assume it will work for you as "...Add/remove/edit images, text, and vector graphics..." is one of the uses they claim its capable of.
Below is a link to their documentation online. It's a rather detailed API so be prepared to read.
http://www.pdftron.com/pdfnet/html/main.html
As for other vendors, I know Adobe has a reseller that licenses their API in a C# form. I don't recall the product name off the top of my head. If memory serves me correctly it's pricey by comparison to pdfNet and it resembles an old style c (not even C++) method of programming. It won't be a comfortable fit if you're only used to C#.