views:

148

answers:

2

So, our hosting provider recently moved our test server from one environment to another, virtualized environment. After the move some things on the test environment got extremely slow.

For example logging in to remote desktop was slow, not using remote desktop, just logging in. Also some asp.net applications that usually runs like the wind are now running like a tortoise. After a lot of debate as to the cause of this slowdown I began investigating the actual problem.

The last interesting find was found when I installed dotTrace on the test server. Running a page I knew would perform badly I got the following (high-level) results for the thread that performed the work for the troublesome page:

Real/wall time: 45538 ms
Thread time:    375 ms

As far as I know this means that the Thread spends an awful long time not being executed. My own pet theory is that the virtual environment is prioritizing other servers work over my server. Could that be the cause? What are your thoughts?

Note: If you need more details like the actual traces I have no problem handing them out if you ask.

Edit: More details! The most expensive calls in the trace are:

1 call to KeyInfoX509Data.ctor(X509Certificate, X509IncludeOption): 30014 ms
1 call to SignedXml.ComputeSignature: 15045 ms

Trace details

+2  A: 

To me, that discrepancy screams of IO wait issues. Either disk or network, most likely, though CPU wouldn't surprise me either.

Since it seems to be specifically in reading the certificate, I'd look into whether there's another VM/service getting greedy on either disk or network. Perpetually downloading big files or a heavily accessed database might be the root cause.

To be sure, you'd have to look at the corresponding activity on all the VMs that share your hardware and possibly the network traces getting to your test box and from your test box to the outside. This is likely something that only the ISP can do (since it's essentially a cross-customer interaction issue).

Depending on the VM Server and the hardware, it may be a tweakable setting or it may not be. If it isn't, there may not be anything you can do about it.

At any rate, I agree with your theory: it's likely not a problem with your application, but rather an issue with your provider. If you've got any clout with the ISP, I'd kick it back to them to resolve and/or investigate changing providers. The cost of hacking around it is likely to substantially dwarf getting them to just dedicate you some hardware or going with a provider that can give you the service you need.

James
+1, agree with James. The two method calls are most likely compute bound, so it's a good idea to have your ISP check into the CPU utilisation of the server.
Jeremy McGee
Thanks for your answer. If it turns out to be either CPU or IO wait I'll accept it. Until then +1. ;)IO and CPU has always been my main suspects as well but its only a hunch since I cant profile the actual virtual server. I hope the hosting provider will discover a tweakable setting somewhere.
JohannesH
It turned out not to have anything to do with neither the CPU or the disks. The problem was caused by an incorrect DNS server registration on the server in question. I've answered the question below. But for others who might read this it is probably more likely that the reason has something to do with CPU or disk load.
JohannesH
A: 

So it turned out to be a security/network/dns problem. One of the DNS registrations on the server was incorrect. This led to a wrong IP being returned when it tried to lookup the AD server. That then led to timeouts when requesting AD information which then again led to some other problems. All of these issues only showed themselves as a long pause when requesting certain pages.

JohannesH