tags:

views:

907

answers:

1

I'm using PDFBox for a C# .NET project. and I'm getting a "TypeInitializationException" (The type initializer for 'java.lang.Throwable' threw an exception.) when executing the following block of code :

  FileStream stream = new FileStream(@"C:\1.pdf",FileMode.Open);

  //retrieve the pdf bytes from the stream.
  byte[] pdfbytes=new byte[65000];

  stream.Read(pdfbytes, 0, 65000);

 //get the pdf file bytes.
 allbytes = pdfbytes;

 //create a stream from the file bytes.
 java.io.InputStream ins = new java.io.ByteArrayInputStream(allbytes);
 string txt;

 //load the doc
 PDDocument doc = PDDocument.load(ins);
 PDFTextStripper stripper = new PDFTextStripper();

 //retrieve the pdf doc's text
 txt = stripper.getText(doc);
 doc.close();

the exception occurs at the 3rd statement :

PDDocument doc = PDDocument.load(ins);

What can I do to solve this ?

This is the stack trace :

        at java.lang.Throwable.__<map>(Exception , Boolean )
   at org.pdfbox.pdfparser.PDFParser.parse()
   at org.pdfbox.pdmodel.PDDocument.load(InputStream input, RandomAccess scratchFile)
   at org.pdfbox.pdmodel.PDDocument.load(InputStream input)
   at At.At.ExtractTextFromPDF(InputStream fileStream) in
 C:\Users\Administrator\Documents\Visual Studio 2008\Projects\AtProject\Att\At.cs:line 61

Inner Exception of the InnerException :

  • InnerException {"Could not load file or assembly 'IKVM.Runtime, Version=0.30.0.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot find the file specified.":"IKVM.Runtime, Version=0.30.0.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58"} System.Exception {System.IO.FileNotFoundException}

OK, I solved the previous problem by copying some .dll files of the PDFBox to my bin folder. but now I'm getting this error : expected='/' actual='.'--1 org.pdfbox.io.PushBackInputStream@283d742

Are there any alternatives to using PDFBox ? is there any other reliable library out there I can use to extract text from pdf files.

A: 

It looks like you missing some library for PDFBox. You need:

  • IKVM.GNU.Classpath.dll
  • PDFBox-X.X.X.dll
  • FontBox-X.X.X-dev.dll
  • IKVM.Runtime.dll

Read this topic Read from a PDF file using C#. You can find the discussion of similar problem in comments of this topic.

Sasha
thanks for your response Sasha.I've already solved this problem though.I'm now facing another one : "expected='/' actual='.'--1 org.pdfbox.io.PushBackInputStream@283d742". it seems this doesn't happen with all pdf files, but with some.
Attilah
It look like something wrong with your pdf file. Is it well formated? Can you post link to it, I will try to test this situation?
Sasha
I'm in a WCF client/WCF service scenario here. so, I send a file by streaming to the WCF Service and then try to extract text from it as text is received. maybe that's where the problem lies.
Attilah
I don’t think so. Try to compare file before send and after... And look at file format, it look like something wrong with it.
Sasha