tags:

views:

71

answers:

2

How can i fetch a HTML page and save it to my database in JAVA?is there any easy way to do that?

+1  A: 

Not sure about your exact requirements.

For something simple you can use HttpClient

For something more complex, you can use Nutch It does crawling, indexing and searching as well.

leonm
@Leonm : First Upon thks for reply, what i need is,if i type www.yahoo.com in the textbox,then it should copy the entire html of yahoo's index page to database, is there any way for that?
Alex Mathew
You'll have to write some plumbing of your own. Basically you'll fetch the URL from the textbox and pass it to HttpClient (or something similar). Upon a successful return you store the contents to a database, perhaps with JPA or straight JDBC.
leonm
@Leonm :thks...
Alex Mathew
+2  A: 

Receiving a file over http is pretty easy using the URL class:

String rawHtml = IOUtils.toString(new URL("http://yahoo.com").openStream());

IOUtils is taken from org.apache.commons.io, the toString method reads the whole input stream into one String. Unfortunately by using java.net.URL you cannot control anything (cookies, header information, ..) besides the website's address :-/ Personally, I use this approach wherever I can since the HttpClient's API is too complex (too many LOC) to simply retrieve the source code of a website.

f1sh