While there are MANY enterprise customers who are virtualizing mission critical apps, some who are virtualizing clients in all shapes and forms, some who have functioning internal self-service ITaaS portals, and some who are leveraging external public cloud models – for the most part, we’re still in relatively early days.
Paul Maritz at PEX talked about how today, while there are many advanced examples – as a whole – most customers are just finally getting done at virtualizing the “craplications” (aka as “Phase I of the VMware journey”), and are starting to virtualize things that carry heavy SLAs. In my experience – the existance of SLAs is the defining element of a “mission critical application” vs. a “craplication”.
FTW, obviously I’m being a little facetious when using the word “craplication”. These apps, generally “owned and operated” by IT are very important, and virtualizing them is a no-brainer. What they don’t have is high “risk” or SLAs associated with them.
When you ask about a given application: 1) “what are the specific performance requirements”; 2) “what are the disaster recovery and retention SLAs”; 3) “does the application or it’s data require compliance with security policies and is it audited” you get two very distinct responses.
- Craplication = “huh? Look, we have a lot of these running on 1U/2U servers, one NIC, and local storage. I have no idea about it’s backup, performance or security issue.”
- Mission Critical Apps = “yes, it does have those SLAs, and here they are”.
Now, for the craplications, the “huh” translates to “don’t worry. Let’s just virtualize - it full steam ahead. We’ll deal with stuff as it comes up”. And that process generally works really well – and has resulted in massive TCO over the last few years, and that’s all been very good.
For mission critical apps – unless you can answer with confidence that you can not only meet those SLAs, but improve them, they tend to stay “stuck” on older, less efficient stuff – because risk of change is pronounced (or at least perceived that way).
I think that 2011 will be a year of a LOT of progress on this front.
For most enterprises, there’s not any application that is more mission critical than their core business apps, which generally means either Oracle or SAP. While it’s been a while in coming, there’s been a lot of thawing around Oracle and VMware on many fronts. I talked about that over the last year, and you can see the latest here.
Within EMC, there’s an interesting forcing function – Joe Tucci and Paul Maritz are emphatic that both IT organizations must be ahead of our own stuff – acting as a early alpha/beta not only of the technology, but the whole idea. Every quarter, we have a quarterly business review, and one of the many topics is what our respective IT orgs are doing to push the envelope.
We’re currently pretty far along, with 75% of our workloads running as VMs, including the vast majority of our mission critical apps, and a broadly rolled out View 4.5 deployment. It’s still early days of self-service portals but we are using various flavors of ITaaS.
One thing that is VERY cool is that this is all being done very publicly, and with LOTS of detail (you can get details on all the EMC IT projects at www.emc.com/emcIT
A killer example was one I mentioned at PEX – the results of replatforming our Oracle 11i/10g RAC deployment from Solaris to Linux and from Sparc to x86, and for the 11i app tier – virtualizing it on vSphere 4.1.
For perspective – this is one of the largest Oracle 11i Apps deployments in the world. It’s also the beating heart of EMC’s business – supporting our quoting, inventory, huge parts of the CRM system and more.
So – what was the result?
- 10-20x performance improvements.
- $5M cost savings.
- 90% greater productivity
If that doesn’t get you moving, I think you might have a serious issue.
If you’re one of the millions of enterprises with rusting old legacy big-endian systems (Power, Sparc, etc) and considering the next evolution of the critical heart of your most mission critical systems – you owe it to yourself to read this whitepaper. Fluffy marketing it ain’t.
ADDED Feb 24th - NOTE: I’m not saying that by definition big-endian is BAD, and little-endian is good. What I am saying is that the speed of refresh, and commoditization forces that have applied more strongly in little-endian land mean that a unit of compute power for a unit cost seems to have swayed quite firmly to x86 (little-endian) land. You simply DON’T see the Google/Facebook/Rackspace/Terremark (or most enterprises) building a LOT of new large scale big-endian deployments. On this one, the relative market growth of little-endian relative to Itanium, Power, Sparc (which still are used in a TON of places that are very important) speaks more volumes than I every could. But you see almost every enterprise having a lot of really old big-endian systems out there rusting. That was certainly the case here.
As an example, we describe critical lessons learnt through the process. In some cases, it was literally so much faster, that it had 2nd and 3rd order effects that we didn’t consider in the initial process. We found important VMware, EMC, Cisco bugs through the process.
So – what’s next? Beyond building one of the largest and most energy efficient (BTW, 100% virtualized) datacenters in the world (a crazy cool story on it’s own), we’ve been working hand in hand with VMware on “future vSphere releases”. One of the main goals of that next release is to handily virtualize the DB tier (currently still running on the UCS hardware as physicals). Stay tuned – as cool as the current stuff is (really, hats of to Sanjay’s crew), we’re on an awesome journey…
Would love to hear YOUR “Virtualizing Mission Critical Apps” story….
Hi Chad
Interesting document. The first table on page 12 decribes the Cisco UCS blades as having 18*8GB DIMMS for a total of 96GB RAM, which is incorrect as 18*8 is 144.
Also it would have been interesting to see some detail on the EMC vSphere setup for the solution, even at a high level.
Cheers
DAvid
Posted by: David | February 24, 2011 at 06:28 AM
@David
They could be using online sparing so 12*8GB online and 6*8GB as spares.
Posted by: Andrew Fidel | February 24, 2011 at 02:37 PM
The WP Link appears to be broken, can you send it to me directly?
Posted by: Don Sullivan | February 24, 2011 at 03:40 PM
How did you go with Oracle licensing in this environment? Was that factored into the cost savings (inc or ex)? I have seen people license by cluster and then run multiple VMs of Oracle DB and Apps side by side, which saves a ton of money but is not Oracle best practice. VMware doesn't appear to have any hard guidance here. Seems the balance here is license as many ESX hosts as you can afford to be able to use the most features. (DRS, HA, SRM, FT etc..) The less you spend the more you need to turn off to comply with Oracle. SRM is first to go, then DRS, then FT and HA. Until you end up with a small cluster. At which point you've got feature parity with OVM. (sort of - it doesn't offer much). Curious how EMC dealt with this one. Is there a magic cost/benefit crossover point? - Erik.
Posted by: Erik H. | February 25, 2011 at 04:20 AM
@David
That is a typo. The memory config on our B200's is 12 x 8 and not 18 x 8.
- Ken
Posted by: Kenpaul | February 25, 2011 at 12:35 PM
interesting to see this in large scale. I had similar running on smaller scale a few years ago. 3 RAC nodes (2 stand alone & 1 virtual) + 9 virtual JDE servers (logic, development & web)
we ran into performance issues on the DB side. Our primary DB was about 4TB & the development/test were about 2TB each. The problem with the virtual RAC server was shared CPU with other VM guests. Thus the VM was only used a failover, & not as primary node.
Each of our physical RAC nodes had 16GB & 1 CPU (2 cores) (to keep the per CPU licensing cost down)
I'd be interested to know how the RAC nodes there actually are. (document only says 2 are shown)
(everything was connected up to a cx4-240 w/ FC disks using ASM)
Posted by: twitter.com/needcaffeine | February 25, 2011 at 02:27 PM
I'd be curious to learn how you got on with Orcale licenses also.
I find it rather appalling that Oracle will allow limited license purchases when you carve up a Sun system with Containers (zones), which by all accounts are a very fluid and dynamicly resizable object; And yet they don't view a VM under ESX which in most cases is fixed cpu/memory count (without a reboot) object in the same way.
MS was for a long time the same when it came to SQL server, but at least they've got with the times and adjusted thier model to meet virtualised environemnts. As Erik mentioned it seems the only way to lisence Oracle on ESX/VM's is to fully license the hosts and it just gets messy when you want DRS,SRM,FT and HA in the picture.
Posted by: Andrew | February 28, 2011 at 05:34 PM
Erik H, needcaffeine, Andrew: In my Oracle vSphere environments, we tend to license a couple of vSphere hosts (yes, the entire host) for Oracle and then host all the Oracle DBs on those hosts. Same with other products like Oracle Weblogic or Internet Application Server that is processor based. This can either be a dedicated "cluster" of hosts inside your overall vSphere cluster (pre vSphere 4.1 affinity / DRS rules) or just dedicated hosts (using DRS affinity / anti-affinity rules). We then use DRS, HA, FT, in that cluster to ensure best performance. Using things like resource shares and SIOC, we're able to ensure our Production VMs on those hosts get the resources they need while still allowing non critical systems to get resources.
Yes, you do fully license at least two vSphere hosts to get the best of both worlds. One thing I've done before is to turn off half the sockets in each host via the BIOS so I can still use just 1 physical host's worth of Oracle CPU licenses but can still get the advantages of DRS,HA,FT on two physical hosts. CPUs are *way* cheaper than Oracle licenses. Of course this means none of the VMs need more than that halved amount of sockets, but that isn't a true limitation in many of the environments I've seen.
Many of the high CPU utilization Oracle systems I've seen greatly benefited from using Oracle tuning features like SQL Profiles to reduce the CPU (and Disk I/O) utilization. This in turn eliminated the need to buy more processor licenses.
Posted by: Jay Weinshenker | March 04, 2011 at 10:18 PM
@Eric
Hate to tell you that turning off CPUs in the BIOS does not relieve you of your contractual obligation to license them.
I'm not defending Oracle's licensing policy here, just describing it: their license policy states clearly that you must license a product for all the processors physically installed in the server on which that product runs, unless some of those processors are explicitly made unavailable to the product via an approved "hard partitioning" method.
As of today, disabling CPUs in the BIOS is not an approved hard partitioning method. Physically removing a processor from the server is.
Steve
Posted by: Steve Lewis | August 25, 2011 at 08:23 AM