tags:

views:

181

answers:

3

Is it possible to programmatically place the contents of a web page into a Word file?

To further complicate this, I'd like to do these steps in Java (using JNI if I must).

Here are the steps I want to do programmatically, followed by ways that I would do this manually today:

  1. Provide a method with a URL (Manually: Open page in Firefox)
  2. Copy the contents of that URL (Manually: Ctrl-A to select all)
  3. Create a new Word document (Manually: Open Microsoft Word)
  4. Paste the contents of the URL into Word (Manually: Ctrl-V to paste)
  5. Save the Word file (Manually: Save the Word file)
+2  A: 

you could do better imho downloading the file using HTTP then create a new word file using Apache POI and copying the HTTP stream inside the word file

dfa
+1  A: 

HTMLUnit can be used to programmatically open the page (posing as Firefox if necessary), and Apache POI can be used to create a Microsoft Word file (in Word 97 format).

CoverosGene
A: 

This article describes a way to manipulate MS-Word doc files from within Java, just using string replace, or XSLT.

As for grabbing the content of a URL, that is the simpler part of the task, which you can accomplish with something pretty simple.

import java.net.URL;
import java.net.URLConnection;
import java.io.InputStreamReader;
import java.io.BufferedReader;


public class util
{

  public String HttpGet(String urlString)
  {
    String resultData= null;
    try
    {
      URL url = new URL(urlString);
      URLConnection conn = url.openConnection();
      conn.connect();

      BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
      String line = null;
      java.lang.StringBuffer sb1= new java.lang.StringBuffer();
      while ( (line = br.readLine()) != null)
        sb1.append(line);

      resultData= sb.toString();
      mStatus= "gotprice";

    } 
    catch (java.lang.Throwable e)
    {
      e.printStackTrace();
    }
    return resultData;
  }


}
Cheeso