views:

311

answers:

4

When you know that your software (not a driver, not part of the os, just an application) will run mostly in a virtualized environment are there strategies to optimize your code and/or compiler settings? Or any guides for what you should and shouldn't do?

This is not about a 0.0x% performance gain but maybe, just maybe there are simple things that will improve performance drastically or things that seem simple but are known to be disastrous in virtualized environments. For example enabling CONFIG_PARAVIRT in a kernel build is easily done and can boost performance a lot. Now I'm looking for similar things for applications, if there are any.

In my case it will be C++ Code and probably VMWare but I want to keep the question as language/product-agnostic as possible. I wonder if there even are such strategies or if this would be a complete waste of time - after all the concept of virtualization is that you don't have to care too much about it.

+2  A: 

The only advice that I can give you is careful use of mlock() / mlockall() .. while looking out for buggy balloon drivers.

For instance, if a Xen guest is booted with 1GB, then ballooned down to 512 MB, its very typical that the privileged domain did NOT look at how much memory the paravirtualized kernel was actually promising to processes (i.e. Committed_AS). Actually, with Xen, unless this value is placed on Xenbus, the privileged host has no idea what such a balloon will do. I believe this also coincides with KVM, depending upon how the kernel is configured .. but your question presumes that we know nothing about such things :)

So, protect stuff (be careful, but prudent) that simply can not be paged out, always account for the 'sky is falling' scenario.

Likewise, use of posix_fadvise() / posix_madvise() to tell the PV kernel just how much you do or do not need buffering is always a good idea.

Beyond that, there's very little that you can do .. since you're talking only to the paravirtualized kernel, which is designed to make processes oblivious to virtualization in the first place.

I don't use KVM much (yet), though I plan to explore it more in the future. However, 90% of the stuff that I have been writing lately is specifically designed to run on paravirtualized Xen guests. Sorry to be a little Xen / C centric, but that's where my experience is and pv_ops is in mainline (soon also xen-0 ops) :)

Good question, btw :)

Edit:

When I said 'careful but prudent' , I meant one step above conservative. If your program allocates some job structure that most functions need, lock it. If your allocating buffers to read huge files, don't lock them .. and be sure to call posix_fadvise() to let the kernel know you only intend to access it once (if that's the case). Also, be sure to advise the kernel on your use of memory mapped files, especially if they serve to organize concurrency.

In short, help your host kernel manage memory, don't let critical allocated blocks get thrown into dirty paging, don't assume your program is more important than anything else by locking everything it allocates :)

Sorry for the ambiguity. The best phrase I could come up with was 'careful, but prudent'.

Tim Post
+1, but, "(be careful, but prudent)"? What do you mean? those are practically synonymous...
Assaf Lavie
Edited to explain :)
Tim Post
+1.addition: <http://insights.oetiker.ch/linux/fadvise.html> "As of this writing (2.6.21) Linux does not remember POSIX_FADV_DONTNEED advice for an open file. It acts when the advice is given, and when it can not comply it forgets the advice. So it is up to you to make sure Linux can comply."
VolkerK
I wish I could rate a comment +1, many documentation sources do not reflect that.
Tim Post
+1  A: 

My only advice is keep your memory and IO use low if possible.

IO in a VM is pretty slow compared to physical hardware. If you can avoid doing it then avoid it.

Adam Hawes
+1  A: 

The things that are slow on real hardware are even slower when the system is virtualized. It depends on the virtualization technology being used how much slower they become.

Especially avoid anything that requires I/O with the world outside the virtual environment. Depeding on how things are set up, this includes drawing on the screen, swapping, and disk and network I/O. That's roughly in a decreasing order of importance.

If possible, pretend you're writing for a ten-year-old computer. If your application would work on a 1999 desktop PC, or laptop, it should do OK.

Lars Wirzenius
+4  A: 

I've found it to be all about I/O.

VMs typically suck incredibly badly at IO. This makes various things much worse than they would be on real tin.

Swapping is especially a bad killer - it completely wrecks VM performance, even a little, as IO is so slow.

Most implementations have a large amount of IO contention between VMs, I've not seen one which avoids this or handles it sensibly.

So use a ramdisc for temporary files if you can, but don't cause it to swap, because that would be even worse.

MarkR
IO when a program is one coded in a way to cooperate with the kernel and 2 running on a kernel with the appropriate PV drivers is a non issue. This question actually brings into light how well one understands the host kernel, virtualization or not. Please discuss I/O specifics, beyond 'don't do it'.
Tim Post