I'm writing a web crawler for a specific site. The application is a VB.Net Windows Forms application that is not using multiple threads - each web request is consecutive. However, after ten successful page retrievals every successive request times out.
I have reviewed the similar questions already posted here on SO, and have implemented the recommended techniques into my GetPage routine, shown below:
Public Function GetPage(ByVal url As String) As String
Dim result As String = String.Empty
Dim uri As New Uri(url)
Dim sp As ServicePoint = ServicePointManager.FindServicePoint(uri)
sp.ConnectionLimit = 100
Dim request As HttpWebRequest = WebRequest.Create(uri)
request.KeepAlive = False
request.Timeout = 15000
Try
Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
Using dataStream As Stream = response.GetResponseStream()
Using reader As New StreamReader(dataStream)
If response.StatusCode <> HttpStatusCode.OK Then
Throw New Exception("Got response status code: " + response.StatusCode)
End If
result = reader.ReadToEnd()
End Using
End Using
response.Close()
End Using
Catch ex As Exception
Dim msg As String = "Error reading page """ & url & """. " & ex.Message
Logger.LogMessage(msg, LogOutputLevel.Diagnostics)
End Try
Return result
End Function
Have I missed something? Am I not closing or disposing of an object that should be? It seems strange that it always happens after ten consecutive requests.
Notes:
In the constructor for the class in which this method resides I have the following:
ServicePointManager.DefaultConnectionLimit = 100
If I set KeepAlive to true, the timeouts begin after five requests.
All the requests are for pages in the same domain.
EDIT
I added a delay between each web request of between two and seven seconds so that I do not appear to be "hammering" the site or attempting a DOS attack. However, the problem still occurs.