My understanding is that a CGI spawns a separate executable process on the server each time but that a Servlet does not do that - but I'm not sure how to describe what happens with a servlet by comparison. Since the servlet exists inside the JVM and the JVM is a single process, where does the Servlet exist in relation to it?
The servlet container (JVM process) typically handles each request in a different thread.
The maximum number of threads used, if threads who have finished servicing a request are kept alive to be re-used in the future, etc., are generally all configurable attributes.
Servlet code executes in a thread. This thread is spawned by the Servlet container which is a Java application running in JVM.
Upon receiving a request, Servlet container starts a thread which executes the servlet code and this code is given the incoming request to process. Upon finishing the processing this thread goes to either a pool or simply terminates depending upon how the container is developed.
The benefit is that: Spawning a new process is more costly (memory, IO and CPU cycles wise) for OS than spawning a thread inside an existing process. A thread also shares memory space with the parent process.
Threads can be pooled. Although a thread is less costly to create; there certainly is a performance cost to be paid; however having a pool of threads solves that to some extent.
Another good point of having Threads is that the error handling can be elegantly done. If a thread returns by throwing an error it is much easier to handle it than a process terminating with error.
At runtime, a CGI process is launched by the web server as a separate OS shell. The shell includes an OS environment and process to execute the CGI code, which resides within the server's file system. Each new http request launches a new OS shell on the server. The response time of CGI programs is high because CGI programs execute in their own OS shell, the creation of an OS shell is a heavy-weight activity for the OS.
In the case of a servlet, it runs as a thread in the web container instead of in a separate OS process. The web container itself is an OS process, but it runs as a service and is available continuously. When the number of requests for a servlet rises, no additional instances of the servlet are created. Each request is processed concurrently using one Java thread per request.
Note that a servlet executes as a thread within the web container's process.