I have a server application that handles clients requests in different manner.
I want to know how many users can be served with minimal latency, so I made a small stress test application that simulate the users requests; at the same time another application monitor the memory/CPU utilization.
The stress test tool creates thread every second where every thread represents a user. If the stress test can not create a new thread due to lack of resources it starts a new instance of the stress test tool.
The problem is, every thread writes into the file the latency for each request and the current number of threads running so this causes I/O problem as after couple of minutes you have a lot of threads that need to write into disk also this behavior will not be exist in the real scenario as the client only request the data.
How can I overcome this problem as I want to measure the maximum latency per user?
PS:
Some answers say to run on different machine to take into consideration the network latency ok, this well be my final stress test currently I am doing this test on the same server to find how many users are supported with minimal latency.