You know, VM HA is the thing that I get the most frustrated with with VMware. Which is kinda ironic - is an area where EMC and VMware have a long trail of cooperation (check out the name of the service during startup, or the firewall rule for the VM HA agent if you want to decode what I mean). DNS is super important, make sure you always have that right.
To be fair: it's WAY easier and IMHO much more flexible than the other ways HA is done in VMware's competitors - compared with WSFC (aka MSCS) it's a walk in the park, and offers much more granular object failover (but less granular failure detection). I'm just saying a better job could be done documenting some of these things (something I know others have championed).
I had configured the das.isolation change once before, but then needed to reconfigure it when I was rebuilding my clusters at home, and just couldn't get it to work...
Here's how you do it...
Ok - after adding "das.isolationaddress1" in the VM HA "advanced options" window, I kept getting this screen:
Argh - I know you can't reach the gateway!!! My poor little Linksys router cant understand anything except a class-c /24 subnet - that's why I specified das.isolationaddress#!
WHY DO I KEEP GETTING THE ERROR!
Ah, phew - finally found the original KB article here (definitely bookmarkable). You have to also specify: das.usedefaultisolationaddress=false
So, here's where you make the changes (note how LAN and iSCSI are on seperate subnets, and I've got two das.isolation addresses - one on each - you should also have other vmkernel traffic like vmotion of a different subnet also):
Mental note to self..... Someone should really post all the non-documented advanced options.... Has anyone seen this anywhere? Otherwise, I'll ping my VMware compatriots.
Ok - now that the cluster is happy again, I can focus on something fun, not something so pedantic. I'm going to do a series on "HOWTO _____ using the Celerra Simulator"....
Here are some:
das.failuredetectiontime amountofseconds (60 seconds = 60000 Timeout time for isolation response actions)
das.isolationaddress IPAdress (Address used by the host to verify isolation status)
das.isolationaddress2 IPAdress (backupaddress for the isolation satus)
das.poweroffonisolation Boolean (False or true ensure all VM's remain powerd on)
das.vmMemoryMinMB Value (Higher values will reserve more space for failovers.)
das.vmCpuMinMHz value(Higher values will reserve more space for failovers.)
das.defaultfailoverhost Hostname (First choise to which VM's will failover)
Source, which is a great Blog btw:
http://ictfreak.wordpress.com/2008/02/19/vmware-undocumented-parameters-for-advanced-features-of-ha/
And his source:
http://www.vmug.nl/modules.php?name=Forums&file=viewtopic&p=13315#13315
Posted by: ThVuy | June 27, 2008 at 04:22 PM
Hi Chad,
here is the VMware document link to the Advanced HA options:
http://pubs.vmware.com/vi3i_i35/resmgmt/wwhelp/wwhimpl/common/html/wwhelp.htm?context=resmgmt&file=vc_cluster_das.10.9.html
However, there are a lot of advanced settings I've not been able to get hold easily, i.e. Advanced DRS options.
I'm currently trying to put a wiki together to start collecting all advanced options. I'll let you know when I make it visible online.
Cheers, Forbes.
Posted by: Forbes Guthrie | June 27, 2008 at 04:34 PM
I had the same issue a while back in a classroom setup. I just changed the gateway to temporarily fix it. But this is a better solution! Thanks,
Posted by: Duncan | June 27, 2008 at 06:26 PM
Thanks all so much for all the comments with more advanced options - more fun to play with!
Posted by: Chad Sakac | July 03, 2008 at 10:13 AM
Thanks for the article and information.
Did someone feedback you, the screen shot you posted are not easily readable, even when i click on the image i see it is of same size.
Posted by: vmzare | July 22, 2008 at 08:20 AM