tags:

views:

67

answers:

3

Usually CSV and excel file format will be used to import data as it is easy to extract data programatically. My users doesn't like excel file format for data entry, they like word document. But I am not sure how to extract data from Microsoft word document. Has anyone tried? do you have any suggestions?

Found this link, but not sure how to create such template and what API to use in Java to extract values.

+4  A: 

There are libraries like Apache POI that make it easier than it would otherwise be.

Jonathon
Do you know Apache POI does word document reading?
Venkat
@Venkat yes, it reads Word, Excel, PowerPoint, etc. It also allows writing, but its writing capabilities are more limited.
Jonathon
+1  A: 

if we think of Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java.

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;

public class readDoc
{
    public static void main( String[] args )
    {
        String filesname = "Hello.doc";
        POIFSFileSystem fs = null;
        try
        {
                  fs = new POIFSFileSystem(new FileInputStream(filesname; 
                  //Couldn't close the braces at the end as my site did not allow it to close

                  HWPFDocument doc = new HWPFDocument(fs);

          WordExtractor we = new WordExtractor(doc);

          String[] paragraphs = we.getParagraphText();

          System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
          for( int i=0; i<paragraphs .length; i++ ) {
            paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n","");
                    System.out.println( "Length:"+paragraphs[ i ].length());
          }
                }
                catch(Exception e) { 
                    e.printStackTrace();
                }
         }
}

Still you can refer more from this link

I hope this helps to you

harigm
A: 

I like this answer came in comments:

You might want to explore InfoPath, its the MS forms technology and you can import forms from MS Word. – ktingle Jun 30 at 2:32

Venkat