The "acceptable delay" entirely depends on your application. Dealing with everything on the same thread can indeed help if you've got very strict latency requirements. Fortunately most applications don't have requirements quite that strict.
Of course, if only one thread is able to receive requests, then tying up that thread for computing the response will mean you can't accept any other requests. Depending on what you're doing you can use asynchronous IO (etc) to avoid the "thread per request" model, but it's significantly harder IMO, and still ends up with thread context switching.
Sometimes it's appropriate to queue requests to avoid having too many threads processing them: if your handling is CPU-bound, it doesn't make much sense to have hundreds of threads - better to have a producer/consumer queue of tasks and distribute them at roughly one thread per core. That's basically what ThreadPoolExecutor
will do if you set it up properly of course. That doesn't work as well if your requests spend a lot of their time waiting for external services (including disks, but primarily other network services)... at that point you either need to use asynchronous execution models whenever you would potentially make a core idle with a blocking call, or you take the thread context switching hit and have lots of threads, relying on the thread scheduler to make it work well enough.
The bottom line is that latency requirements can be tough - in my experience they're significantly tougher than throughput requirements, as they're much harder to scale out. It really does depend on the context though.