views:

53

answers:

2

I need to download a huge number of files from net based on a keyword. The steps i am following are

  1. Using Scraping figure out the links to files
  2. Using WebClient.DownloadData() download the byte[]
  3. Save the arr to a file.

Is it a good idea to create one thread for downloading each file for better performance. Any suggestions. Thanks

foreach (string each in arr)
        {

            Thread t = new Thread(
                                new ThreadStart(
                                    delegate
                                    {

                                        string[] arr2 = each.Split(new string[] { "http://" }, StringSplitOptions.None);

                                        string[] firstElem = arr2[1].Split(new string[] { " " }, StringSplitOptions.None);

                                        string urlToDownload = @firstElem[0].Replace("\"", string.Empty);
                                        string filName = Path.GetFileName(urlToDownload);
                                        string dirName = DirInAppConfig();
                                        DataRow row;
                                        bool dataExistsInDtKwWithSameDownloadLinkAndFileName;
                                        getRowForKwDownLinkFileName(urlToDownload, filName, out row, out dataExistsInDtKwWithSameDownloadLinkAndFileName);
                                        downloadFile(Client, urlToDownload, dirName, filName, search, row);
                                    }));
                                t.IsBackground = true;
                                t.Start();
                                t.Join();
        }
+2  A: 

Often server limit the download from one IP to 2 connections. So if all files are from the same server, multiple threads might not help much.

Remy
A: 

Have you done a performance analysis that indicates to you that you need to consider threading? No? Then you're using premature optimization, and you should stop that right now.

Do you have experience with multithreading, such that you're not likely to make some stupid mistake about locking, or, if you do make such a mistake, you will be able to quickly find and fix it? No? Then you should stop right now.

You may have no clear idea how much more time it can take to debug a multithreaded program. That time could totally overwhelm the time you could save by using multiple threads.

John Saunders
Generally true, but the OP is _asking_ if it could be an improvement. And he mentions a 'huge number of files', that addresses your last point. But yes, he should measure.
Henk Holterman