views:

4203

answers:

12

The IT department where I work is trying to move to 100% virtualized servers, with all the data stored on a SAN. They haven't done it yet, but the plan eventually calls for moving the existing physical SQL Server machines to virtual servers as well.

A few months ago I attended the Heroes Happen Here launch event, and in one of the SQL Server sessions the speaker mentioned in passing that this is not a good idea for production systems.

So I'm looking for a few things:

  1. What are the specific reasons why this is or is not a good idea? I need references, or don't bother responding. I could come up with a vague "I/O bound" response on my own via google.
  2. The HHH speaker recollection alone probably won't convince our IT department to change their minds. Can anyone point me directly to something more authoritative? And by "directly", I mean something more specific than just a vague Books OnLine comment. Please narrow it down a little.
A: 

The biggest concern to me when virtualising software is normally licensing.

Here's an article on it for MS SQL. Not sure about your situation so can't pick out any salient points.

http://www.microsoft.com/sql/howtobuy/virtualization.mspx

RB
A: 

SQL Server is supported in a virtual environment. In fact I would recommend it seeing that one of the licensing options is per socket. This means you can put as many SQL Server instances in a virtualized (e.g. Windows 2008 Server Datacenter) system as you like and pay only per processor socket the machine has.

It's better than that because DataCenter is licensed per socket with unlimited Virtual machine licenses as well.

I'd recommend clustering your Hyper-V on two machines however so that if one fails the other can pick up the slack.

Mike Brown
+6  A: 

SAN - of course, and clustering, but regarding Virtualization - you will take a Performance Hit (may or may not be worth it to you):

http://blogs.technet.com/andrew/archive/2008/05/07/virtualized-sql-server.aspx

http://sswug.org has had some notes about it in their daily newsletter lately

Cade Roux
+2  A: 

Here's some VMWARE testing on it.. http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf

Granted, they do not compare it to physical machines. But, you could probably do similar testing with the tools they used for your environment.

We currently run SQL Server 2005 in a VMWARE environment. BUT, it is a very lightly loaded database and it is great. Runs with no problems.

As most have pointed out, it will depend on your database load.

Maybe you can convince the IT Department to do some good testing before blindly implementing.

Brian
You don't know our IT dept.
Joel Coehoorn
+1  A: 

No, I can't point to any specific tests or anything like that, but I can say from experience that putting your production database server on a virtual machine is a bad idea, especially if it has a large load.

It's fine for development. Possibly even testing (on the theory that if it runs fine under load on virtual box, it's going to run fine on prodcution) but not in production.

It's common sense really. Do you want your hardware running two operating systems and your sql server or one operating system and sql server?

Edit: My experience biased my response. I have worked with large databases under heavy constant load. If you have a smaller database under light load, virtualization may work fine for you.

Kevin
Edited my question to reflect the 'production' issue. Lots of other systems work under virtualization. If by adding an OS (one of which is nearly always idle) you can make backup/restore better and improve clustering and uptime, why not?
Joel Coehoorn
+18  A: 

I can say this from personal experience because I am dealing with this very problem as we speak. The place I am currently working as a contractor has this type of environment for their SQL Server development systems. I am trying to develop a fairly modest B.I. system on this environment and really struggling with the performance issues.

TLB misses and emulated I/O are very slow on a naive virtual machine. If your O/S has paravirtualisation support (which is still not a mature technology on Windows) you use paravirtualised I/O (essentially a device driver that hooks into an API in the VM). Recent versions of the Opteron have support for nested page tables, which removes the need to emulate the MMU in software (which is really slow).

Thus, applications that run over large data sets and do lots of I/O like (say) ETL processes trip over the achilles heel of virtualisation. If you have anything like a data warehouse system that might be hard on memory or Disk I/O you should consider something else. For a simple transactional application they are probably O.K.

Put in perspective the systems I am using are running on blades (an IBM server) on a SAN with 4x 2gbit F/C links. This is a mid-range SAN. The VM has 4GB of RAM IIRC and now two virtual CPUs. At its best (when the SAN is quiet) this is still only half of the speed of my XW9300, which has 5 SCSI disks (system, tempdb, logs, data, data) on 1 U320 bus and 4GB of RAM.

Your mileage may vary, but I'd recommend going with workstation systems like the one I described for developing anything I/O heavy in preference to virtual servers on a SAN. Unless your resource usage requirements are beyond this sort of kit (in which case they are well beyond a virtual server anyway) this is a much better solution. The hardware is not that expensive - certainly much cheaper than a SAN, blade chassis and VMWare licencing. SQL Server developer edition comes with V.S. Pro and above.

This also has the benefit that your development team is forced to deal with deployment right from the word go - you have to come up with an architecture that's easy to 'one-click' deploy. This is not as hard as it sounds. Redgate SQL Compare Pro is your friend here. Your developers also get a basic working knowledge of database administration.

A quick trip onto HP's website got me a list price of around $4,600 for an XW8600 (their current xeon-based model) with a quad-core xeon chip, 4GB of RAM and 1x146 and 4x73GB 15k SAS hard disks. Street price will probably be somewhat less. Compare this to the price for a SAN, blade chassis and VMware licensing and the cost of backup for that setup. For backup you can provide a network share with backup where people can drop compressed DB backup files as necessary.

EDIT: This whitepaper on AMD's web-site discusses some benchmarks on a VM. From the benchmarks in the back, heavy I/O and MMU workload really clobber VM performance. Their benchmark (to be taken with a grain of salt as it is a vendor supplied statistic) suggests a 3.5x speed penalty on an OLTP benchmark. While this is vendor supplied one should bear in mind:

  • It benchmarks naive virtualisation and compares it to a para-virtualised solution, not bare-metal performance.

  • An OLTP benchmark will have a more random-access I/O workload, and will spend more time waiting for disk seeks. A more sequential disk access pattern (characteristic of data warehouse queries) will have a higher penalty, and a memory-heavy operation (SSAS, for example, is a biblical memory hog) that has a large number of TLB misses will also incur additional penalties. This means that the slow-downs on this type of processing would probably be more pronounced than the OLTP benchmark penalty cited in the whitepaper.

What we have seen here is that TLB misses and I/O are very expensive on a VM. A good architecture with paravirtualised drivers and hardware support in the MMU will mitigate some or all of this. However, I believe that Windows Server 2003 does not support paravirtualisation at all, and I'm not sure what level of support is delivered in Windows 2008 server. It has certainly been my experience that a VM will radically slow down a server when working on an ETL process and SSAS cube builds compared to relatively modest spec bare-metal hardware.

ConcernedOfTunbridgeWells
+3  A: 

We are running a payroll system for 900+ people on VMWare with no problems. This has been in production for 10 months. It's a medium sized load as far as DB goes, and we pre-allocated drive space in VM to prevent IO issues. You have to defrag both the VM Host and the VM slice on a regular basis in order to maintain acceptable performance.

David Robbins
A: 

I would think that the possibility of something bad happening to the data would be too great.

As a dead simple example, let's say you ran a SQL Server box in Virtual Server 2005 R2 and undo disks were turned on (so, the main "disk" file stays the same and all changes are made to a separate file which can be purged or merged later). Then something happens (usually, you run into the 128GB limit or whatever the size is) and some middle of the night clueless admin has to reboot and figures out he can't do so until he removes the undo disks. You're screwed - even if he keeps the undo disk files for later analysis the possibilities of merging the data together is pretty slim.

So echoing the other posts in this thread - for development it's great but for production it's not a good idea. Your code can be rebuilt and redeployed (that's another thing, VM's for source control aren't a good idea either) but your live production data is way more important.

Schnapple
Wouldn't that also occur on a physical drive? What would the difference be between a full hard drive that your reboot?
David Robbins
I've been in situations where the VM will not boot unless you drop the undo disk. It's different than a hard drive crash, it's a corrupt file on a drive. It's true that physical hard drives aren't perfect but a VM is an added layer of headache on something so important. YMMV.
Schnapple
I was under the impression that in general VMs are considered MORE reliable, not less.
Joel Coehoorn
+1  A: 

There is some information concerning this in Conor Cunningham's blog article Database Virtualization - The Dirty Little Secret Nobody is Talking About.... To quote:

Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance. SQL Server's core engine assumes things like:

  1. all CPUs are equally powerful
  2. all CPUs process instructions at about the same rate.
  3. a flush to disk should probably happen in a bounded amount of time.

And the post goes on elaborating these issues somewhat further also. I think a good read considering the scarcity of available information considering this issue in general.

+1  A: 

Note there are some specialty virtualization products out there that are made for databases that might be worth looking into instead of a general product like VMWare.

Our company (over 200 SQL servers) is currently in the process of deploying HP Polyserve on some of our servers:

HP PolyServe Software for Microsoft SQL Server enables multiple Microsoft SQL Server instances to be consolidated onto substantially fewer servers and centralized SAN storage. HP PolyServe's unique "shared data" architecture delivers enterprise class availability and virtualization-like flexibility in a utility platform.

Our primary reason for deploying it is to make hardware replacement easier: add the new box to the "matrix", shuffle around where each SQL instance resides (seamlessly), then remove the old box. Transparent to the application teams, because the SQL instance names don't change.

BradC
A: 

Security issues that can be introduced when dealing with Vitalization should also be considered. Virtualization Security is a good article by PandaLabs that covers some of the concerns.

kristof
+3  A: 

I wanted to add this series of articles by Brent Ozar:

It's not exactly authoritative in the sense I was hoping for (coming from the team that builds the server, or an official manual of some kind), but Brent Ozar is pretty well respected and I think he does a great job covering all the issues here.

Joel Coehoorn