Folks – vSphere 5.0 update 1 is out! Download it here…
Read the ESXi 5.0 update 1 release notes here.
Read the vCenter 5.0 update 1 release notes here.
There are tons of things in there to note. I’ll highlight a couple, but would really recommend scanning the release notes.
- “New coolness”- too much to list (it’s in the release notes). Here’s one example. On the continued evolution (and resolution to some issues with UNMAP in some cases and some array targets) of space reclaim, update 1 automatically disables UNMAP (the reason and workarounds pre updated 1 I’ve noted here). The only case (that I know of personally) where there have been issues is in some cases during svmotion as I have said in the past. But, they have updated vmkfstools to enable you to do an unmap across a datastore – even specifying the percentage of zeroed blocks to reclaim. Check that out here. BTW – this isn’t stopping here. You can expect more work on the array vendors and in future vSphere releases to continue to improve this efficiency use case.
- “errata” – the release notes do a great job of not only highlighting fixes, but open errata. For example (I’ve run into this one), when you try to unmount an NFS datastore when Storage IO Control (or Storage DRS) are in use – you can’t unmount, and get an error message that the resource is in use. I’m guessing that this is because there is an open file handle. Disable SIOC/Storage DRS, then umount.
One thing in particular that I’ve been waiting for in update 1 was the changes in storage device failures and VM HA.
In vSphere 5, the “All Paths Down” (APD) condition (ESX can’t reach a device via any path) got a new “friend” the “Permanent Device Loss” state (when a target communicates, “hey, this device is gone, but don’t expect it back anytime soon” – ergo when it has been removed intentionally from the host, or the target is in a partitioned state). I’ve discussed this here.
For a bit of context – the use case for “Stretched vSphere Clusters” is turning out to be much more popular than I think many folks (certainly me) expected. I know that my respected colleague Scott Lowe is asked about this almost daily. Lee Dilworth and I did a very popular session at VMworld 2011 on this topic, check it out here.
For the last 1-2 years, VMware, EMC and others in the industry have been looking at really planning, thinking and engineering the solution stack around this use case. It’s not the same as a regular cluster, with other failure conditions that need to be planned for. This has resulted in the “vSphere Metro Stretched Cluster” HCL category, which incorporates testing for these failure conditions (read more on that here). Beyond just testing, we continue to enhance each part of the solution to continue to make stretched clustering work better and better – working more simply, in a more integrated, and frankly invisible fashion (this is what Lee and I were talking about in the close of that VMworld session when we discussed some of the “futures”).
In vSphere 5.0 update 1, one other “shoe drops”. PDL codes are used by EMC VPLEX when “partitioned” (where all connectivity between 2 sites in a VPLEX cluster fail). This means that the VPLEX cluster nodes in the non-preferred site for a device (this is a “per device” setting that declares in advance which site “stops IO to that device” to avoid split brain at the storage level) says “hey, the IO to this device on this target is stopping, and you shouldn’t expect it to come back momentarily”.
What’s changed, is up until now – the loss of a storage device doesn’t by definition trigger a VM HA response. This is an example of what Lee and I were talking about in our session. People over-simplify when thinking about stretched clusters, and just assume that VM HA will work “like SRM” (often because storage vendors tell them it will). VM HA wasn’t originally designed for this use case.
In vSphere 5.0 update 1 – a PDL response can trigger a VM HA response – if you set an additional VM HA parameter. Sweet! Duncan Epping also noted this change on Yellow Bricks (always awesome) here.
BUT – it is the plan to continue – with each minor/major VMware release to increasingly think of these geographically dispersed clusters, and the new category of geographically dispersed active-active storage models as a design center.
VMware – thanks for the continued coolness – and I know it’s going to keep on coming!

Sweet! We recently tried out to cut the links between to sites in a streched cluster when preparing demos for a fair, and had huge discussions on why it wouldn't work... since then we were very hot in waiting for the update :-)
Posted by: Felix | March 21, 2012 at 09:01 AM
Hi Chad,
This is great news.
Are there any other HA stretched cluster issues remaining?
I was told that a complete failure of a VPLEX cluster (very unlikely) would result in an APD and HA would not kick in.
Do you have a time-scale for when the remaining issues will be resolved?
Would you now recommend that customers deploy VPLEX VMware HA stretched clusters - I know in the past you felt that vSphere was not quite ready and most of the time you would be better off with SRM?
Many thanks
Mark
Posted by: Mark Burgess | March 21, 2012 at 02:28 PM
I did post some comments but they've been lost in space :(
anyway..to answer the last comment. if your two sites are sync distance apart (which they need to be for most stretched setups) then usually some basic questions can help you figure out which solution might work for you.
- are the two datacenters / sites a few metres apart? miles? / on the same campus?
- if they are *really* close (metres/campus) if we put the DR solution in place would we still fail a DR audit as the recovery location is too close?
- do we have a flat network between the sites/datacenters?
- do we need the ability to test failover scenarios non disruptively?
- do we need a solution that provides non disruptive mobility?
- can we tolerate planned outages for the times we need to move workloads?
- do we need a solution that can support different address spaces at either site?
- do we have the knowledge and skills to understand the affects of partitions / site down events and the various failure scenarios associated with a stretch setup? and do we have the capacity to POC these properly?
- is our network setup to handle the stretched configuration? do we understanding traffic "trombone" and can we handle network partitions correctly that might affect the cluster?
if you are considering a stretch setup for any of the following reasons these are usually loud warning bells to me in customer meetings that trigger me to really ask more questions about the customers understanding of what they are getting into..so do any of these sound familiar:
- we want to be able to vmotion between sites
- the stretch solution is cheaper
- the DR solution is more complex
- the stretch solution does not require the same ongoing monitoring and maintenance as the DR solution so it seems easier
- what are the chances of losing both links between the sites
- HA is just like an instant failover like fault tolerance right? (yes i've heard that one)
- if the site goes down everything just vmotions over (yes i've heard that recently as well)
- sequencing of vm restart isn't that important, if applications don't start up or services fall over because dependent VM's weren't "up" yet we'll just login and start them or script that....make sure you do as doing this for 10 vm's might be fine....100? 500? 1000?
the point is a stretched or as one of my peers in the EMC VPLEX team (Olly I'm officially stealing your term) calls it, a federated HA setup is VERY different to a DR solution based on SAN replication where you rely on something like SRM to provide the orchestrated replay during a failover.
federated HA solutions or "vSphere Metro Storage Clusters" as we now call them in vmware are NOT just about the storage layer, they are just as much about the network layer, the vSphere layer and the workings of vSphere HA are VERY important to understand.
at vmware we are currently working on some papers to enable customers who do fall into the category of being suited for a stretched setup to have a better understanding of how to manage it correctly, how to design the objects in vCenter (datastores/datastores clusters/naming conventions) in such a way that it makes the inventory more logical for a stretch setup and also makes things like site locality simpler to enforce and configure and then once configured have an easier way to be able to see "what is where" at any one time. its not until you build one of these environment that these nuances become apparent. Trust me we did it recently with a small 30 VM setup and it got confusing...imagine if there were 10000 VM's!!!
so what else needs to be managed to ensure your cluster behaves correctly or has some kind of even site bias / balance:
- DRS affinity groups need to be setup and maintained as you are adding/removed VM's to the estate (for large setups really need to automate that into provisioning process for both hosts and vm's)
- datastore clusters are useful to help with locality
- hearbeat datastores should be increased from 2 to 4 and select 2 per site
- HA restart priority settings should be configured and maintained on an ongoing basis IF you want to maintain any kind of control over restart sequencing, remember you do not have a recovery plan style runbook as you would in SRM so you need to realise HA restarts are not that organised and in fact the restarts could be different and usually will be different every time as it will depend on what failed as to what vm's get restarted. this might be an issue if your apps have a specific startup dependency and your dealing with 100's or 1000's of vm's.
Bottom line. Stretched setups are not new. vSphere 5.0 includes features to improve the day to day use. Make sure you understand the differences between the two and choose what's right for you and most importantly of all what's acceptable to the team running the system. the success of the solution and the effectiveness of it in terms of protecting your virtual infrastructure will be directly affected by the ease of use and the "buy in" of the ops team. Without that, trust me it'll fall apart in days. I always think back to when I used to write scripts for DR. Scripts worked great on day 1, customer was happy and nodded enthusiastically when I pointed out naming conventions etc that scripts looked for during recovery....on day 2, naming conventions went out of the window scripts stopped being useful and were broken, recovery now massively at risk. This can happen in exactly the same way with HA solutions and DR solutions. Only the customer truly knows which one they feel they can honestly cope with and commit to.
Posted by: Lee Dilworth | March 28, 2012 at 06:17 AM