« So, What Kind of Investments is EMC Making in VMware? | Main | 10 Gigabit Ethernet and VMware - A Match Made in Heaven »

June 16, 2008

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Gabrie van Zanten

Hi

Very good article!!! My compliments :-)

I linked to your article from my blog, hope you don't mind:
http://www.gabesvirtualworld.com/?p=70

Gabrie

Stu

Excellent work Chad, I'll have to stop blogging soon if you keep posting this level of quality content so frequently ;-)

Chad Sakac

Stu - you're not being fair to yourself... give yourself a plug, I frequent your blog often! Ok, I'll plug it - http://www.vinternals.com

Your SRM post is a good one - http://www.vinternals.com/2008/05/what-vmware-site-recovery-manager-isnt.html.

We have an EMC saying: Backup and DR are DIFFERENT. You need to do regular system backup for application recovery. The other thing is that (in general, each array replication technology is different), SRM is based on the idea of a "real time" remote copy (even if "real-time" is time-shifted with async replicas). The point is ONE COPY at the remote site. There's the persistent issue of errors being replicated to the remote site. Each of our replication technologies has as option to avoid this (snaps of target for MV, SRDF, Celerra Replicator - or in Recoverpoint's case - continous data protection)

EMC's answers for local backup (not the only ones) are:

1) Replication Manager 5.1.2 (trying to get them to rename it to "Replication Manager for VMware" - this handles application-integrated backup, VMFS-level integration, and array snapshot mechanisms.

2) Avamar Virtual Edition (and regular) - source based dedupe, very cool, very popular with VMware. For Networker customers - Avamar can be integrated also.

Chad Sakac

Gabrie - thank you! You can absolutely link of course!.... Still trying to figure out how this trackback thing works. It's always fun to learn new things!

Jon

Good post! I wrote about it two months ago : http://kurrin.blogspot.com/2008/04/como-funciona-vmware-ha.html
(sorry in Spanish).

A couple of things, one related with 14 ESX clusters:
1.- Sometimes people prefer not to exceed the 8 ESX node cluster because VMFS has a bad performance over this number of nodes (8) accessing the same VMFS formated LUNs. It's a storage/performance Best Practice.

2.- The other thing is: How can we calculate the number of VMs that we can power on in a certain Cluster? Adding the slots? In your example:
VC2.0 -> The smaller of [(8 slotmem + 12 slotmem) OR (6 slot CPU + 8 slot CPU)] = 14 Virtual Machines Max?
VC2.5 -> The smaller of [(8 slotmem + 8 slotmem) OR (1 slot CPU + 2 slot CPU)] = 3 Virtual Machines Max?

Am I right?

3.- My personal recommendation is that it's better to use Shares in a DRS cluster (because of its dynamic behaviour) instead of using Reservations.

Thnx!
Jon

Jon

Good post! I wrote about it two months ago : http://kurrin.blogspot.com/2008/04/como-funciona-vmware-ha.html
(sorry in Spanish).

A couple of things, one related with 14 ESX clusters:
1.- Sometimes people prefer not to exceed the 8 ESX node cluster because VMFS has a bad performance over this number of nodes (8) accessing the same VMFS formated LUNs. It's a storage/performance Best Practice.

2.- The other thing is: How can we calculate the number of VMs that we can power on in a certain Cluster? Adding the slots? In your example:
VC2.0 -> The smaller of [(8 slotmem + 12 slotmem) OR (6 slot CPU + 8 slot CPU)] = 14 Virtual Machines Max?
VC2.5 -> The smaller of [(8 slotmem + 8 slotmem) OR (1 slot CPU + 2 slot CPU)] = 3 Virtual Machines Max?

Am I right?

3.- My personal recommendation is that it's better to use Shares in a DRS cluster (because of its dynamic behaviour) instead of using Reservations.

Thnx!
Jon

Jon

Good post! I wrote about it two months ago : http://kurrin.blogspot.com/2008/04/como-funciona-vmware-ha.html
(sorry in Spanish).

A couple of things, one related with 14 ESX clusters:
1.- Sometimes people prefer not to exceed the 8 ESX node cluster because VMFS has a bad performance over this number of nodes (8) accessing the same VMFS formated LUNs. It's a storage/performance Best Practice.

2.- The other thing is: How can we calculate the number of VMs that we can power on in a certain Cluster? Adding the slots? In your example:
VC2.0 -> The smaller of [(8 slotmem + 12 slotmem) OR (6 slot CPU + 8 slot CPU)] = 14 Virtual Machines Max?
VC2.5 -> The smaller of [(8 slotmem + 8 slotmem) OR (1 slot CPU + 2 slot CPU)] = 3 Virtual Machines Max?

Am I right?

3.- My personal recommendation is that it's better to use Shares in a DRS cluster (because of its dynamic behaviour) instead of using Reservations.

Thnx!
Jon

Chad Sakac

Thanks Jon - wish I could read spanish, you would have saved me a lot of time!

Quick comments,
1) I haven't seen that VMFS issue (bad VMFS performance with more than 8 hosts) - do you mind providing the source?

It's certainly not an EMC storage best practice - I want to make sure it's not an old thing. Most VMFS "limits" are mythology - not saying this is or it isn't but would like to make sure.

2) You got it right.

3) Interesting idea - you can certainly use shares (which are relative to one another and don't work into the math) - my preference is use shares, but then there are some VMs where you absolutely need reservations - use them. It's not the reservations that really hammer the map, it's the vCPU multiplier.

Jon

Thanks for your response Chad, sorry for the 3 identic comments, I think it was a typepad.com problem(after the capcha page, it stopped and never gave me the Ok, so I insisted 3 times...;)

Some comments/questions:

1.- The VMFS issue: I think it's a Best Practice, I've heard several times about it, the last time here: http://vmetc.com/2008/06/10/vmfs-storage-sizing-for-maximum-performance/
I personally think that it VARIES from one environment to another, sometimes we have very heavy VMs and other times not.
I think that the DRS infrastructure is like building a wall with irregular bricks.
In most cases a 12 host cluster is appropiate because there are no(or a few) CPU/Mem killer VMs. In other cases a 6 cluster + 6 cluster is better in terms of storage and in terms of DRS maximization.
Also, we don't have to forget that in most cases we use HA with DRS, so 16ESX is the limit (I think) (http://communities.vmware.com/thread/97465)

2.- ok, so the algorithm is really pretty conservative. I hope VMware will change this issue(If they can...)
We can always check "start machines even if they violate availability constraints" option in the event of a host failure (once a year?, once each two years?...)

3.- Completely agree. Only use Reservation if you REALLY need them. And if th VM needs high reserves, think if it has to be virtual or not, or if it is worth to put in another esx or not.

Thank you!
I'm going to press Post button, we'll see If I post another 3 times... ;-)
Jon

Alex

A very useful post! I want to ask a quick question regarding the math involved in the calculation. We calculate the host memory slots based on the host memory divided by memory slot size. So in the examples below it should be 32/2 = 16 slots.
ESX Server 2 has 32GB of RAM and 8 CPU's running at 2GHz - with VC 2.0.x = 12 memory slots & 8 CPU slots; with VC 2.5 = 8 memory slots & 2 CPU slot
ESX Server 3 has 32GB of RAM and 8 CPU's running at 3GHz - with VC 2.0.x = 12 memory slots & 12 CPU slots; with VC 2.5 = 8 memory slots & 3 CPU slot

Is this correct or did I missed something in the article?

Alex

Daniel Eason

Great post very informative, I wouldnt mind seeing how you manage to get 54 Outlook items open!!!!

Chad Sakac

Jon - you're correct, and I mistyped that. thanks for the correction!

Hugo Peeters

Great post! Thanks a million!

For everyone that has difficulties calculating this for their own specific environment, I have created a script that can do the hard work for you.

Download it here:
http://www.peetersonline.nl/index.php/vmware/helpful-script-of-the-day-ha-calculations/

alex trip

What if you have say 4 VMs (each with 1 vCPU each) in a resource pool with 8000 MHz assigned to the resource pool... how does that affect the HA math?

Dave Convery

Chad -
I’m just trying to clarify slot size calculations and how to roll it up to figuring out the number of requires servers in an HA cluster.

Lets say the largest VM is 4CPU and 16GB RAM (with a 16GB reservation). The ESX servers are all 2 socket, quad core 3GHz with 32GB RAM.

With overhead (about 650), my RAM slot size would end up being around 17GB., which gives me less than two slots per ESX server. Is this correct?

Now, if I have a VM with only 1 CPU and 2GB RAM, it will still take up a slot. If I do not change the default slot size settings, the slot is roughly 75% wasted. Is this correct?

In this scenario, if I do not “tweak” the slot sizes, do I only get one VM per node since it works out to about 1.7 slots per node?

Dave

Duncan

Dave, for more recent info:
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

and like we discussed your assumptions are correct.

amriendra

it's really amazing :)

Küchen Freiburg

Thanks for the tutorial, it's pretty helpful.

Küchen Freiburg

Grüße from Freiburg

khan

Hi,
Please let me know how and when HA will recalculate Slot's after addition of VM's to HA cluster.

Thanks,
Khan

Ben @ geekswing

Great post .. and the comment "satisfying the reservation takes more than the end-user expects" made me laugh. Because we had this error and it took me quite a bit of research and talking with vmware help to get us going. (And by the way while I like vmware support, they are very talented, I think I got someone who wasn't too good - just kept blowing us off so I had to go through the manual).

We were barely using 20% CPU and 30% RAM and couldn't power on another VM with a host failure cluster tolerates set to 1 (of 5 servers). When I changed to percentage, I was able to take it to 75% reservation and still power on VMs. (WHAT?). We fixed a lot of our VM reservation issues so we are good now, but I couldn't even imagine why the failure cluster tolerates wasn't working. So anyway, long story short (short story long?) this post about calculating the slot size, and how conservative that calculation is, was quite enlightening!! Thanks!!

My adventures in HA and Host Failure Cluster Tolerates here:
http://geekswing.com/geek/vmware-cpu-and-ram-reservations-fixing-insufficient-resources-to-satisfy-configured-failover-level-for-ha/

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.