views:

187

answers:

1

I'm reading large documents from which I only need top 5%, can I do the following with HttpClient 4?

  1. Request the page (get or post)
  2. Read response as a stream
  3. Feed it into SAX-based HTML parser "on the fly"
  4. When certain HTML tag is detected - terminate the stream

Please note that HttpClient v. 4 is required - I cannot use v. 3

+1  A: 

Thanks to Ken from HttpClient mail list here's the answer

Use the HttpEntity#getContent() method, which returns an
java.io.InputStream, and pass that to your SAX-based HTML parser.

http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e122

When you see the tag you need, terminate the request via invoking the HttpUriRequest#abort() method.

http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e285

DroidIn.net