views:

188

answers:

1

Hi,

I am building a web scraping application. It should scrape a complex web site with concurrent HttpWebRequests from a single host to a single target web server.

The application should run on Windows server 2008.

One single HttpWebRequest for data could take from 1 minute to 4 minutes to complete (because of long running db operations)

I should have at least 100 parallel requests to the target web server, but i have noticed that when i use more then 2-3 long-running requests i have big performance issues (request timeouts/hanging).

How many concurrent requests can i have in this scenario from a single host to a single target web server? can i use Thread Pools in the application to run parallel HttpWebRequests to the server? will i have any issues with the default outbound HTTP connection/requests limits? what about Request timeouts when i reach outbound connection limits? what would be the best setup for my scenario?

Any help would be appreciated.

Thanks

A: 

By default, HTTP protocol limits the user agent to 2 concurrent connections per HTTP/1.1 server.That is the limit you are hitting.

Increase the limit by setting

ServicePointManager.DefaultConnectionLimit.

You can also set it per servicepoint, by setting

ServicePointManager.GetServicePoint(url).ConnectionLimit
feroze