views:

81

answers:

2

I have the following multithreading function to implement threads fetching from a list of urls to parse content. The code was suggested by a user and I just want to know if this is an efficient way of implementing what I need to do. I am running the code now and getting errors on all functions that worked fine doing single thread. for example now for the list that I use to check visited urls; I am getting the 'argumentoutofrangeexception - capacity was less than the current size'/ Does everything now need to be synchronized?

        Dim startwatch As New Stopwatch
        Dim elapsedTime As Long = 0
        Dim urlCompleteList As String = String.Empty
        Dim numThread As Integer = 0
        Dim ThreadList As New List(Of Thread)

        startwatch.Start()
        For Each link In completeList
            Dim thread = New Thread(AddressOf processUrl)
            thread.Start(link)
            ThreadList.Add(thread)
        Next

        For Each Thread In ThreadList
            Thread.Join()
        Next

        startwatch.Stop()
        elapsedTime = startwatch.ElapsedMilliseconds


    End Sub
enter code here Public Sub processUrl(ByVal url As String)

        'make sure we never visited this before
        If Not VisitedPages.Contains(url) Then
            **VisitedPages.Add(url)**
            Dim startwatch As New Stopwatch
            Dim elapsedTime As Long = 0
+2  A: 

If the VisitedPages within processUrl is shared among the threads, then yes, you need to assure only one thread can access that collection at a time - unless that collection itself is thread safe and takes care of that for you.

Same thing with any other data that that's shared among the threads you create.

nos
Thanks for the response; will using synclock suffice for this issue?
vbNewbie
+1  A: 

I am not seeing where VisitedPages is declared, but I do not see it local to the processUrl method. This would make is shared between all of the threads. This would cause a problem with multiple threads accessing the list/collection at the same time. Which would generate errors similar to what you describe. You will need to protect the VisitedPages collection with a mutex or something to guard against this.

Rob Goodwin
Thanks for the response; will using synclock suffice for this issue?
vbNewbie