views:

1226

answers:

1

Hi...!

I downloaded the Apache HWPF. I want to use it to read a doc file and write its text into a plain text file. I don't know the HWPF so well.

My very simple program is here:

I have 3 problems now:

  1. Some of packages have errors (they can't find apache hdf). How I can fix them?

  2. How I can use the methods of HWDF to find and extract the images out?

  3. Some piece of my program is incomplete and incorrect. So please help me to complete it.

I have to complete this program in 2 days.

once again I repeat Please Please help me to complete this.

Thanks you Guys a lot for your help!!!

This is my elementary code :

public class test {
  public void m1 (){
    String filesname = "Hello.doc";
    POIFSFileSystem fs = null;
    fs = new POIFSFileSystem(new FileInputStream(filesname ); 
    HWPFDocument doc = new HWPFDocument(fs);
    WordExtractor we = new WordExtractor(doc);
    String str = we.getText() ;
    String[] paragraphs = we.getParagraphText();
    Picture pic = new Picture(. . .) ;
    pic.writeImageContent( . . . ) ;
    PicturesTable picTable = new PicturesTable( . . . ) ;
    if ( picTable.hasPicture( . . . ) ){
      picTable.extractPicture(..., ...);
      picTable.getAllPictures() ;
    }
}
A: 

If you just want to do this, and you don't care about the coding, you can just use Antiword.

$ antiword file.doc > out.txt

Steve Klabnik