tags:

views:

81

answers:

3

HI All,

I have a PDF file with a xml attached, i need to parse the xml file. Does anyone knows how i do that? I´m using C#.

Thanks in advance.

A: 

Try using LINQ to XML as suggested in this question.

Oren
HI,The problem is not parse the xml, but access the xml inside the pdf. Do you know how i do that?Thanks,
Zorro
Can you describe the situation a little more? Where exactly is the XML?
Oren
I think the xml file is embedded in the pdf document.Tanks
Zorro
A: 

PDF files can have a meta data information object or is it an XML file embedded as an object?

mark stephens
The XML file is embedded as an object.Thanks
Zorro
A: 

I believe this blog post describing how read from a PDF file using C# is what you want.

This is the example he gives of grabbing text from the PDF:

using System;
using org.pdfbox.pdmodel;
using org.pdfbox.util;

namespace PDFReader
{
class Program
{
    static void Main(string[] args)
    {
        PDDocument doc = PDDocument.load("lopreacamasa.pdf");
        PDFTextStripper pdfStripper = new PDFTextStripper();
        Console.Write(pdfStripper.getText(doc));
    }
}
}

Here is what looks like an exhaustive and highly organized list of how to read PDFs with C#.

If what you need is some form of embedded meta data, as Mark suggested, I'm sure it's also possible with the to fetch using the tools I've linked to.

Oren