views:

73

answers:

5

In our application we need to import transaction data from paypal through an API for the users of my application and store in the database. I've thousands (approx 5k now) of users and it is increasing day by day.

This application is a .net windows service.

This imports data on hourly basis for all the users. At present we are importing data for users one user after the other, but sometimes what happens one user data may be so large that it takes around 5hrs to get his entire data, so we are blocking other users till this user data import is finished. This hourly import for all the other users completely gone for a toss.

To avoid this we thought of creating threads for each user import and run them every hour using windows service. Here we have a situation where we need to think about bandwidth at any point of time as all the threads will start at the same time. Is this an issue at all?

Now, I want to know whether our new implementation is right way or not? Also I want to know how it is done usually? If anyone has come across this kind of functionality then please let us know how it is done.

If my question is not clear enough please let me know I'll provide more info.

Edit: If I send so many requests to Paypal from a single IP, how does it handle it? Any idea whether it is limiting requests per IP?

Update: Thanks for all the suggestions and feedback.

I thought of using jgauffin's solution as it was perfect mimic of ThreadPool. But here I need some more features like changing thread limit dynamically and recursive calling of call back method.

After lot of research and analysing thread pool, I've decided to use SmartThreadPool which is made based on threadpool logic but with more features. It is quite good and serving my purpose perfectly.

+1  A: 

Creating 5000 threads in code is not a good thing , it may slows down the server by very huge amount even it may crash it.

What you need is the load balancing out here.

try to think about the MSMQ based solution if you are on .net plateform and quequ user requests and then there must be some dispather which will distribute the user request between servers.

saurabh
I don't see the need for load balancing here. The problem is not that his machine is too busy, it is just the network latencies that add up uncontrollably in a single threaded design.
jdv
+1  A: 

I would use a queue and let's say five threads for this. Each time a thread is is completed it will get a new user from the queue.

Example code:

public class Example
{

    public static void Main(string[] argv)
    {
        //setup
        DownloadQueue personQueue = new DownloadQueue();
        personQueue.JobTriggered += OnHandlePerson;
        personQueue.ThreadLimit = 10; //can be changed at any time and will be adjusted when a job completed (or a new one is enqueued)

        // enqueue as many persons as you like
        personQueue.Enqueue(new Person());

        Console.ReadLine();
    }

    public static void OnHandlePerson(object source, PersonEventArgs e)
    {
        //download persno here.
    }
}

public class DownloadQueue
{
    Queue<Person> _queue = new Queue<Person>();
    int _runningThreads = 0;

    public int ThreadLimit { get; set; }

    /// <summary>
    /// Enqueue a new user.
    /// </summary>
    /// <param name="person"></param>
    public void Enqueue(Person person)
    {
        lock (_queue)
        {
            _queue.Enqueue(person);
            if (_runningThreads < ThreadLimit)
                ThreadPool.QueueUserWorkItem(DownloadUser);
        }
    }

    /// <summary>
    /// Running using a ThreadPool thread.
    /// </summary>
    /// <param name="state"></param>
    private void DownloadUser(object state)
    {
        lock (_queue)
            ++_runningThreads;

        while (true)
        {
            Person person;
            lock (_queue)
            {
                if (_queue.Count == 0)
                {
                    --_runningThreads;
                    return; // nothing more in the queue. Lets exit
                }
                person = _queue.Dequeue();
            }

            JobTriggered(this, new PersonEventArgs(person));
        }
    }

    public event EventHandler<PersonEventArgs> JobTriggered = delegate { };
}


public class PersonEventArgs : EventArgs
{
    Person _person;

    public PersonEventArgs(Person person)
    {
        _person = person;
    }

    public Person Person { get { return _person; } }
}
public class Person
{
    public Person(string fName, string lName)
    {
        this.firstName = fName;
        this.lastName = lName;
    }

    public string firstName;
    public string lastName;
}
jgauffin
+2  A: 

Do not use a thread per user. Put up a WORK ITEM in a thread pool for every user. This way you have the best of both worlds - not the memory overhead of 5000 threads, and more load control because you can determine how many threads the ThreadPool uses to work off the work items.

TomTom
Lets say your thread pool size is 10 and out of the first 100 users, 15 users have so much of transaction data that it will take minimum of 2 hrs for each of the user's data to be imported. Wouldn't that block things for at least 2 hrs?
Ismail
Sure. There is no way around that at all.
TomTom
@TomTom: I've read about thread pool and it is always suggested that if thread is going to run for long time then thread pool is not advised in that scenario. In my case, yes each request may take time as it is sending request to paypal and getting response from there. Is this scenario ok for thread pool if I limit 10 threads in that pool? I'm bit confused about this solution now.
JPReddy
Then reimplement and use your own thread pool. That said, thinki it should be ok - paypal is not exactly long running for transactions.
TomTom
+2  A: 

What I'd do, is start with a pool of threads (say 10), and let each thread do an import. When done, it will take the next item from the queue. You leverage the existing ThreadPool class and queue all your import requests to that threadpool. You can control the max number of threads for this ThreadPool.

Creating thousands of threads is a bad idea for several reasons, it used to be too much for the windows OS, and as you indicate yourself, you might flood the network (or perhaps the paypal service).

For extreme scalability, you can do asynchronous IO that does not block a thread while a request is in progress, but that API has a steep learning curve, and is probably not needed for your scenario.

jdv
@jdv: I'm sorry I didn't get your last point about asynchronous IO and API dependancy for the same. Can you give me some more idea on that?
JPReddy
Lets say your thread pool size is 10 and out of the first 100 users, 15 users have so much of transaction data that it will take minimum of 2 hrs for each of the user's data to be imported. Wouldn't that block things for at least 2 hrs?
Ismail
+1  A: 

I would avoid creating a thread for each user. This approach is not very scalable. And I am assuming the API does not have a mechanism for doing the downloads asynchronously. If it does then that is probably the way to go.

The producer-consumer pattern might work well here. The idea is to create fixed size pool of threads that consume work items from a shared queue. It is probably best to avoid the ThreadPool in your case because it is designed for short-lived tasks mainly. You do not want your long-lived tasks to exhaust it because it is used for a lot of different things in the .NET BCL.

If you are using .NET 4.0 you can take advantage of the BlockingCollection. There is also a backport available as part of the Reactive Extensions download. Here is what your code might look like.

Note: You will have to harden the code to make it more robust, gracefully shutdown, etc. yourself.

public class Importer
{
  private BlockingCollection<Person> m_Queue = new BlockingCollection<Person>();

  public Importer(int poolSize)
  {
    for (int i = 0; i < poolSize; i++)
    {
      var thread = new Thread(Download);
      thread.IsBackground = true;
      thread.Start();
    }
  }

  public void Add(Person person)
  {
    m_Queue.Add(person);
  }

  private void Download()
  {
    while (true)
    {
      Person person = m_Queue.Take();
      // Add your code for downloading this person's data here.
    }
  }
}
Brian Gideon