views:

2418

answers:

6

I want to use HTTP GET and POST commands to retrieve URLs from a website and parse the HTML. How do I do this?

A: 

Use http://hc.apache.org/httpclient-3.x/

Nick Holt
+3  A: 

The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.

For parsing the html, you can use html parser.

kgiannakakis
A: 

I have used JTidy in a project and it worked quite well. A list of other parsers is here, but besides from JTidy I don't know any of them.

Markus
+6  A: 

You can use HttpURLConnection in combination with URL.

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader
Rob Hruska
Create a BufferedReader using the InputStream to read the content into a string variable
rockit
+3  A: 

The ticked/approved answer for this is from robhruska - thank you. This shows the most basic way to do it, it's simple with an understanding of what's necessary to do a simple URL connection. However, the longer term strategy would be to use HTTP Client for more advanced and feature rich ways to complete this task.

Thank you everyone, here's the quick answer again:

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader
Johnny Maelstrom
A: 

This one is set up for my drigg site

This is easy to achieve using java inside your (href="JAVA CODE HERE") tag.

href="javascript:window.open('http://dedlines.com/node/add/drigg/?&url='+escape(location.href), 'newwindow', config='height=600, width=500, toolbar=no, menubar=no, scrollbars=no, resizable=yes, location=yes, directories=no, status=yes')"

Simply copy the code place it inside your webpage and replace my link and form name tag with yours.

Put this on the page you want to add then click it, or simply drag and drop it too your tool bar for easy link submission to your directories or bookmarking sites.

A1SURF
I'm not sure this is entirely related to my question.
Johnny Maelstrom