After we killed the datacenter interlink in the last episode, today we will be pulling the power from all hosts in one of the twin datacenters and see what will happen. As there is nothing impacting the actual metro storage solution, this will be a complete VMware play where the cluster will have to recover from.
Failing ALL ESX hosts in one of the datacenters
We won’t be rebooting or shutting down the servers, nope! We will actually be powering them down using the Dell iDRAC out-of-band management cards inside the PowerEdge servers. As there will be nothing impacting the PowerStore storage solution, we will be focusing today on the vSphere layer.
The vSphere layer will see half of it hosts disappear at once. This should (and WILL) trigger an HA response, and the VMs in the failing datacenter should have no trouble restarting in the remaining datacenter.
Will a metro witness have anything to add here? Not in this case. As the storage solution remains untouched, there is no-one calling upon the witness, so for this test I have actually removed the witness functionality from the lab.
The outcome, as expected, is that when I power off all hosts in DCB, all VMs in DCB stop instantaneously. VMware triggers using High Availability (VMware HA), and the failing VMs from DCB get restarted in DCA. There is no issue starting those, as the PowerStore metro solution that sits under the solution makes the volumes appear to be stretched across sites; so even though the failing VMs get restarted from a DIFFERRENT PowerStore, they will never notice the difference; all metro volumes appear local on both sites at the same time anyway.
2 thoughts on “Blog Series: Metro – Breaking stuff 2 – Fail all hosts in a DC”