I want to learn enough simple/practical queuing theory to model the behavior of a standard web application stack: Load balancer with multiple application server backends.
Given a simple traffic pattern extracted from a tool like NewRelic showing percentage of traffic to a given part of an application and average response time for that part of the application, I think I should be able to model different queueing behaviors with loadbalancer configuration, number of app servers, and queuing models.
Can anyone help point me to queuing theory introductory/fundamentals I would need to represent this system mathematically? I'm embarrassed to say I knew how to do this as an undergrad but have since forgotten all of the fundamentals.
My goal is to model different load-balancer and app-server queuing models and measure the results.
For example, it seems clear an N-mongrel Ruby on Rails application stack will have worse latency/wait time with a queue on each Mongrel than a Unicorn/Passenger system with a single queue for each group of app workers.