How does Informatica handle unstructured data sources like PDF. If a tabular report is stored as a PDF, can we read it out from PDF as a tabular data (like a data table in .net)?
PDF is actually quite structured internally. More recent revisions of the PDF specification may provide a way to hold the data ready for external processing, but the main goal of PDF documents is to describe a document for printing, so all kinds of environments and devices can print the document with a result as similar as possible.
It depends largely on the creator of the PDF if any extra data is provided other than where to print text and lines to form a table.
http://www.informatica.com/products_services/powercenter/options/unstructured/Pages/index.aspx
Funny you mention it, I used to work for the start-up company that invented the underlying technology, until the acquisition with Informatica.
I coudn't find any acceptable answer for this question after a long time. I think that there are no available solution to my question.