views:

407

answers:

2

I want to screen-scrape a website and for that I want to use Http, Socks4 and Sock5 proxies. So my questions are as follows:

  1. Is it possible to use these proxies through Java without using any other external API? For instance, Is it possible to send a request through HttpURLConnection through theseproxies?

  2. If it is not possible, then What other external APIs I can use?

  3. I was doing it by using a headless browser provided by HtmlUnit but it takes time to load even simple webpages, so could you please suggest me other APIs (if any) that provide headless browsers that are fast in loading webpages. I don't want to open webpages that contain heavy AJAX or Javascript code. I just need to click on the forms button through the headless browser.

A: 

Yes, that is possible. You can find the configuration options for different network proxies here.

jarnbjo
+1  A: 

Is it possible to use these proxies through Java without using any other external API? For instance, Is it possible to send a request through HttpURLConnection through these proxies?

Yes, you can configure proxies by either using (global) system properties, or using the Proxy class, or using a ProxySelector. The two later options are available since Java 5 and are more flexible. Have a look at Java Networking and Proxies as mentioned by jarnbjo for all the details.

I was doing it by using a headless browser provided by HtmlUnit but it takes time to load even simple webpages, so could you please suggest me other APIs (if any) that provide headless browsers that are fast in loading webpages. I don't want to open webpages that contain heavy AJAX or Javascript code. I just need to click on the forms button through the headless browser.

Unfortunately, the first alternatives I can think of are either HtmlUnit based (like JWebUnit or WebTest) or slower (Selenium, WebDriver - that you can run in headless mode). But maybe you could try HttpUnit if you don't need advanced JavaScript support.

Pascal Thivent
Your answer is very informative. I have already used Selenium too. And you are right that Selenium is slower than HtmlUnit so there is no question of using Selenium by replacing HtmlUnit.I had tried HttpUnit also two days back but the .jar file that I downloaded for HttpUnit contained various linked libraries too so when I tried to run the program, there were many references errors referring to other libraries. I downloaded some of them but couldn't downloaded all of them so I stopped using it.
Yatendra Goel
With Maven or Ivy, it would be pretty easy to setup your project (with the dependencies). If you're not using one of these tools, the dependencies are mentioned here for example: http://mvnrepository.com/artifact/httpunit/httpunit/1.6.2
Pascal Thivent