This problem appeared today and I have no idea what is going on. Please share you ideas.
I have 1 EC2 DB server (MYSQL + NFS File Sharing + Memcached).
And I have 3 EC2 Web servers (lighttpd) where it will mounted the NFS folders on the DB server.
Everything going smoothly for months but suddenly there is an interesting phenomenon.
In every 8 minutes to 10 minutes, PHP file will be unreachable. This will last about 1 minute and then back to normal. Normal files like .html file are unaffected. All servers have the same problem exactly at the same time.
I have spent one whole day to analysis the reason. Finally, I find out when the problem appear, the file descriptor of lighttpd suddenly increased a lot.
I used ls /proc/1234/fd | wc -l
to check the number of fd
.
The # of fd
is around 250 in normal time. However, when the problem appeared, it will be raised to 1500 and then back to normal.
It sounds funny, right? Do you have any idea what's going on?
======================== The CPU graph of one of the web server.