views:

111

answers:

2

Hello. Is it possible to use HtmlUnit through SOCKS proxy? Could anyone please provide a code sample?

====

So I've dug through webclient sources, here's the best way I can think of:

  1. Subclass MultiThreadedHttpConnectionManager so that it allows setting SOCKS info and if it is set, before returning a Connection, sets SOCKS parameters

  2. Subclass WebConnection - rewrite createHttpClient so that it uses a manager from step 1 and add a method to get that manager directly or http client at first (it is protected now - so bad...)

  3. To use 1) create a WebClient instance 2) Create subclassed WebConnection 3) Set it to be used by WebClient 4) Access connection's manager and use it's methods to use socks

A: 

HtmlUnit uses HttpClient as the underlying connection library, I investigated this a little, but:

1- Couldn't find a way to configure HttpClient (except by the generic Java Socks mechanism defined in http://java.sun.com/javase/6/docs/technotes/guides/net/proxies.html)
2- Do not have access to a public Socks Proxy to test against
Ahmed Ashour
Works in 2.8 due to the newer httpclient. You're one of the developers, right? Thanks for your work.
roddik
Yes, I am one of the team. 2.8 is supporting SOCKS, even at request level, enjoy :)
Ahmed Ashour
+1  A: 

Hi roddik. All you need to do is set the appropriate system properties before creating your WebClient object. For example:

System.setProperty("socksProxyHost", "localhost"); // replace "localhost" with your proxy server
System.setProperty("socksProxyPort", "9999"); // replace "9999" with your proxy port number

WebClient client = new WebClient();

At this point, HttpClient (which is used by HtmlUnit under the covers) will pick up the settings and use the SOCKS proxy for all network communication.

UPDATE: I read your revised question (and your comment) and I think you're on the right track. The problem is that if you implement step 1 using the above system properties, then your code is not thread-safe (because those system properties are global). One solution is to synchronize on something, but of course this can introduce performance problems (may not matter to you).

If you really want to control this in a per-socket basis, then I think you will need to do something like the following:

  1. Create a custom ProtocolSocketFactory that passes a java.net.Proxy object to the Socket constructor (like in this example).
  2. Create a custom Protocol that uses this ProtocolSocketFactory.
  3. Apply this Protocol to the new connections in your custom connection manager using HttpConnection.setProtocol().

I haven't actually tested this, but based on a quick glance at the HttpClient 3.1 source code, I think that's how it would be done. I would love to hear how you ultimately solve this problem :-). Good luck!

Matt Solnit
This way I'll be setting SOCKS properties to all WebClient instances. I want to be able to set different proxy servers for different instances
roddik