views:

52

answers:

2

I am planning to write an app that can open and display PDF documents, and perform OCR on vector graphic elements within the PDFs. The user must be able to select regions of the document and I need to draw real-time annotations on the document. I don't need to alter or save the document itself.

I have plenty of experience with C# and WPF; I have written a similar application already that does the above on XPS/XAML documents rather than PDF. However that app only runs on Windows and PDF documents must be converted to XPS first.

I have done quite a bit of research and there are many, many options available, none of which seem an obvious choice. There are many libraries that can open PDFs or create PDFs, but most don't seem to give you access to individual vector graphic elements in a format that lets you draw/manipulate them on the screen (similar to what I could do with WPF graphic elements extracted from XPS documents).

I am familiar with .Net and C# (including .Net 2 GDI+ graphics) and I am very keen to stick to what I know. I am also using EmguCV for image recognition which can be compiled in Mono or .Net. As such I am looking at Silverlight (running standalone) or Mono options, both of which should run on PC and Mac.

Performance (for both graphics and number crunching) is a strong consideration, though I am just as interested in getting this up and running quickly.

Does anyone have any experience with opening PDFs, extracting vector graphic elements (perhaps as SVG) and rendering them in a Mono app? Can individual elements be rendered to bitmap?

Alternatively, does anyone have experience with opening PDFs in Silverlight and converting them to XPS or XAML at runtime? I know that WPF and Silverlight graphics libraries are not 1:1, but I'm not sure how this affects XPS contents (generally composed of Canvas, Glyphs and StreamGeometry objects).

Thank you for any advice, tips or links you have to share.

A: 

look at this http://silverpdf.codeplex.com/

it's client side pdf reading library. actually right now it can only read files, but you could play with it and make your own "display" functionality.

Ai_boy
Thanks for your answer. I had a good look a PDFSharp (on which silverpdf is based), but it cannot render PDFs. Since graphics extraction and rendering are primarily what I'm after I don't think this library is for me. Silverpdf is apparently also based on iText, but I can't see any evidence that it does what I need either (and if you want proper doco for it you have to buy their book).
AndrewS
A: 

You might want to examine the internals of your PDFs so you understand what they actually contain better - you might be very surprised! For example, text can often be scanned pages or images and vecotr graphics do not exist as neat little packages. We wrote a whole load of general articles about what is inside a PDF and analysis tools at http://www.jpedal.org/PDFblog which are not specific to any tool or language.

mark stephens
Thanks for the link. I'll do some reading and let you know how I go.
AndrewS
Enter submits rather than new lines dammit. I dealt with many peculiararities when working with XPS documents that had been 'printed' from PDFs, and my gut feeling is that PDFs will be worse (or at least as bad). I will be ignoring embedded images or PDFs that are just scans.
AndrewS