Boy, this is a topic that just never stops giving :-) One of my absolute favorite blogs (and bloggers) out there http://blog.scottlowe.org/ did a great post on the topic of VM HA behavior here: Permanent Link to VMware HA Configuration Notes. He and I started talking (both publicly in the comments, then via email) and we both came to the conclusion we needed more info.
So, in the last 24 hrs, I've been pulling together a more complete picture. Read on for the curious and the brave :-)
Ok - first of all, if you want to save yourself some time, take a look at this KB here (most importantly, the pdf at the bottom), which covers a lot of this.
Ok - the first thing that important is that diagrams like the above (while accurate), IMHO cause some of the confusion - they over-emphasize the role of the VC host in VM HA.
Here's how I think about it:
- There is heartbeat network (which is the service console network)
- There are TWO heartbeats:
- the inter-node heartbeats and synchronization that occurs BETWEEN ESX nodes in a cluster (by default every 5 seconds)
- the node-to-isolation address heartbeats that are used to determine if the node is isolated from the rest of the cluster (by default every 15 seconds). Note that this is the thing that triggers "isolation response" (in the screenshot below the "power off/leave powered on" toggle.
Next - a core idea: think of the ESX servers as all having a synchronized "view" of the cluster state as well as the VM HA cluster-wide configuration (i.e.the stuff you setup in this tab):
Ok - so where is this state stored, and how? Well, it's stored in the HA agent config files on every ESX node and on VC. Now, saying the nodes are all synchronized oversimplifies - some of the nodes are "primary" nodes (maintain synchronized state) and some are "secondary" nodes (managed by primary nodes), but for the sake of understanding, this can be considered "internal" (it doesn't have a material effect on the logic)
Ok - now, understanding that - the VM HA behavior is easier to understand.
- You can understand why VC is required for setup, but not for VM HA operation.
- You can understand why the process of a major change involves often can involve disabling HA and then re-enabling and "reconfiguring for HA" - this refreshes the VM HA config and forces synchronization.
So - let's look at what happens in a couple scenarioes (this is NOT an exhaustive list, but used to clarify thinking):
Scenario 1 - an ESX server node (let's call it ESX1) has a network link failure on the service console network, but the isolation address is specified explicitly on another network.
- Internode heartbeats to the ESX server fail. The remaining ESX servers coordinate which of the primary nodes will be in charge. Let's call it ESX3
- In the meantime, ESX1 also can't talk to the rest of the cluster, so checks out the isolation address - and hits it successfully, so it considers itself alive, so doesn't trigger isolation response.
- ESX3 determines which of the remaining nodes has the most slots (see the post So, how --EXACTLY-- does VM HA's admittance algorithm work-), and attempts to start the VM.
- ESX1 still has a file lock on the VM, so ESX3 fails, realizes that somewhere out there, ESX1 is still alive and well, and backs off.
Scenario 2 - same as above, but the isolation address is the default (the gateway of the service console network).
- Internode heartbeats to the ESX server fail. The remaining ESX servers coordinate which of the primary nodes will be in charge. Let's call it ESX3
- In the meantime, ESX1 also can't talk to the rest of the cluster, so checks out the isolation address - and fails, so it decides it's dead, and starts shutting down all the VMs (hard shutdown).
- ESX3 determines which of the remaining nodes has the most slots, and attempts to start the VM.
- There are no file locks, and the VM is restarted.
Scenario 3 - an ESX server node (let's call it ESX1) has a complete hardware failure
- Internode heartbeats to the ESX server fail. The remaining ESX servers coordinate which of the primary nodes will be in charge. Let's call it ESX3
- Isolation response is irrelevant here, ESX1 is dead.
- ESX3 determines which of the remaining nodes has the most slots (see the post So, how --EXACTLY-- does VM HA's admittance algorithm work-), and attempts to start the VM.
- There are no file locks, and the VM is restarted.
Scenario 4 - The isolation address is left as the default and becomes unreachable from the cluster
- Internode heartbeats to the ESX servers are happy
- Isolation response doesn't kick in - the rule is isolation response kicks in when the ESX servers can't see each other AND the isolation IP is unreachable from the individual ESX server.
- All VMs stay running.
In this recipe of "understanding VM HA" - you need to add in one very specific behavior, and you have every ingredient to extrapolate all the critical best practices. The service console will always try to route on the first route on a given subnet, even if that path is dead - http://theether.net/kb/100041.
OK - I HATE memorizing best practices, they all seem so arbitrary. BUT if you can understand core mechanisms - you can extrapolate out all the best practices as you need them, and you can actually troubleshoot problems.
It's like trig - the identities seem like total gobbledygook, but when you do identity proofs, you really get it, and from then on, they aren't arbitrary - they are logical. Now - I may be weird, but this is how I learn, and how it seems many people learn.
So....
Next step - read this article: http://www.yellow-bricks.com/2008/01/14/service-console-redundancy/ Duncan - I (like most EMC folks) hate losing (though it happens, and the most important thing is to learn from the experience), but man, I'm glad VMware got you, and hope the work/life balance is good :-) If you dig good technical blogs, check out Duncan's. If you read my post, and understand Duncan's (and BTW - I TOTALLY agree with his "pick option 3" conclusion) - then you have all the know-how to succeed.
- Use good sense - don't hardcore DNS (good to see that called out explicitly in the doc)
- Having the service console (heartbeat network) on the same interface as your network traffic means that a single interface failure will cause BOTH the HA agents internode heartbeats to fail, and the isolation network to fail, triggering isolation response and total VM shutdown, but that's not bad, that's OK - the VMs will restart on another node.
- NIC teaming to two switches can provide some redundancy, two NICs on two vswitches and two isolation addresses are better.
- Service console redundancy needs to be on different subnets to work.
- A single isolation address can work just fine, but it doesn't hurt to have two (if your gateway isn't robust or you run into a spanning tree problem). The other major reason why this is a good idea? By definition, if your service consoles are on different subnets, but you only have one, you're routing to at least one isolation address. I don't know about you - but that just adds one extra layer of possible problems. Keep your isolation addresses on the service console subnets.
- The isolation address being on the same network as the service console network is far less useful than it is being on the production network - after all - it represents getting a quiet echo when the ESX server says "HELLO WORLD!?" after it realizes it's buddies in the cluster have gone quiet.
- If you're having problems, avoid the temptation to turn off "Leave VMs Powered ON" - if (and anyone in our business knows it's less if, more when) you have problems, "leave VMs powered on" will just draw out the bad behavior - isolation response, setup right - is GOOD.
Quod Erat Demonstrandum
Thanks go out to Marc Sevigny at VMware on the VM HA team and Anand Pillai and Anand Paladugu on the EMC Autostart team for the help!!!
Great article again! Thanks for the link!
Posted by: Duncan | July 15, 2008 at 06:50 AM
Great article. Still working through some kinks with the plan for my HA deployment. One of the systems only has 3 nics and it is causing a warning message to appear for non-redundant SC networking. I don't like to place too much on one nic.
Question though - In your plan one nic would be on say 10.5.188.x and the second on 10.5.189.x for the best reliability? Do you still give static ips to both adapters? How do you handle the naming convention in DNS?
Posted by: John | July 22, 2008 at 11:30 AM
Chad -- great article - i have a good practical perspective now of how HA actually works
now I have long pending question - am hoping you can answer it for me
now lets say if a host gets isolated - we have configured isolation response to not poweroff machines - even in that case will other hosts in the cluster try to poweron the VMs on the isolated host ? - i think it does that - which beats logic to me - can we somehow configure HA to say if the isolation response is leave VM powered on - do nothing
Thanks!
Balu
Posted by: balu | May 07, 2009 at 04:50 AM