Hi, I am hoping someone will be able to help me.
For the second time in as many weeks the AOA cluster has totally failed. Within 5 seconds, all the nodes lose each other, cluster isn't quorate and shuts down. 7 - 8mins later everything comes back up. last week I am pretty sure it was because one of our server admins was doing work on DNS servers which required a reboot, and there appeared to be event logs that supported this. This week I cannot find any DNS related issues and the error exactly the same as last week is:
Cluster node {same on all nodes} was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
And:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
I have ran the cluster validation wizard for the network section, no issues other than HostRecordTTL, RegisterAllProvidersIP, two adapters from 4 that are not connected and the fact it looks like there is only one network adapter, but in fact it is a teamed adapter.
Any help would be greatly appreciated as im not sure if it is a cluster / AOA config issue or network / DNS issue.
Thanks,
Steven.