SQL Server 2016 running on Windows Server 2012 participating in three replica availability group. All three nodes using gMSA for SQL Server services. gMSA has CONNECT to the instance and the endpoint. System's been working smoothly for over a year.
two out of three nodes behaving normally following windows patching and reboot.
third node can no longer connect to endpoints on other nodes, reporting :
Login failed for user 'domain\machine$'. Reason: Token-based server access validation failed with an infrastructure error. Login lacks Connect SQL permission. [CLIENT: <ip address>].
As I say, the sql services are running under a group managed service account and the AG's have behaved normally over the last year, which includes routine OS patching and restarts.
The computer object in the domain is not disabled nor is the SQL gMSA. The VCO is enabled.
Obvious remedy is to grand the machine account connect (instance and endpoint) on each node, but that doesn't answer they why this happened and what's going on.
I've restarted the SQL Service several times, even rebooted. SQL Server service comes back up just fine, but I can't get the primary replica on this AG to come up since it can't connect to the secondaries. Nor can I create new AGs for the same reason. I have deleted this particular AG and restored with recovery the databases to get the application back on line. If I can't figure this out then I'm left with building a new cluster node.
I'm not entirely sure I understand the error, but it seems as though the machine can't access the service account. Anyone have any suggestions?