views:

503

answers:

3

How does Informatica handle unstructured data sources like PDF. If a tabular report is stored as a PDF, can we read it out from PDF as a tabular data (like a data table in .net)?

A: 

PDF is actually quite structured internally. More recent revisions of the PDF specification may provide a way to hold the data ready for external processing, but the main goal of PDF documents is to describe a document for printing, so all kinds of environments and devices can print the document with a result as similar as possible.

It depends largely on the creator of the PDF if any extra data is provided other than where to print text and lines to form a table.

Stijn Sanders
A: 

http://www.informatica.com/products_services/powercenter/options/unstructured/Pages/index.aspx

Funny you mention it, I used to work for the start-up company that invented the underlying technology, until the acquisition with Informatica.

Yuval A
A: 

I coudn't find any acceptable answer for this question after a long time. I think that there are no available solution to my question.

Faiz