Dear all,
I have a big issue with a new availability group installation/configuration. It does an error and do not create the group...
It seems that the group goes online and then is killed by the failover cluster... But I don't see why. I do have searched the web about my issue but I have tried everything proposed :
1. Grand privileges to NT AUTHORITY\SYSTEM (Connect SQL to, View server state to, Alter any availability group to
2. Local admin for the agent /engine service account on Windows and on the SQL database
3. Delete my cluster and recreated it
4. Tried creating the group without the listener
5. Have exactly the same Hardware configuration (HDD / RAM / CPU)
Here the log from the SQL Server (from SSMS)
08/13/2019 09:07:01,spid55,Unknown,Always On: WSFC AG integrity check failed for AG 'AG-SQLIPSN-DEV' with error 41044<c/> severity 16<c/> state 1.
08/13/2019 09:07:01,spid55,Unknown,Error: 19435<c/> Severity: 16<c/> State: 1.
08/13/2019 09:07:01,spid55,Unknown,The state of the local availability replica in availability group 'AG-SQLIPSN-DEV' has changed from 'RESOLVING_NORMAL' to 'NOT_AVAILABLE'. The state changed because either the associated availability group has been deleted<c/> or the local availability replica has been removed from another SQL Server instance. For more information<c/> see the SQL Server error log<c/> Windows Server Failover Clustering (WSFC) management console<c/> or WSFC log.
08/13/2019 09:06:01,spid55,Unknown,The state of the local availability replica in availability group 'AG-SQLIPSN-DEV' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The state changed because the local availability replica is joining the availability group. For more information<c/> see the SQL Server error log<c/> Windows Server Failover Clustering (WSFC) management console<c/> or WSFC log.
08/13/2019 09:04:39,spid15s,Unknown,Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
08/13/2019 09:04:39,spid15s,Unknown,Always On Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
Here the logs from the Cluster :
EVENT ID : 1254 Error - Clustered role 'AG-SQLIPSN-DEV' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.
EVENT ID : 1205 Error - The Cluster service failed to bring clustered role 'AG-SQLIPSN-DEV' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
EVENT ID : 1069 Error - Cluster resource 'AG-SQLIPSN-DEV' of type 'SQL Server Availability Group' in clustered role 'AG-SQLIPSN-DEV' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
I am totally lost on why it doesn't work. My previous AlwaysOn configuration went fine without any issue and we did the same thing for this one...
The only thing I could think of is to begin the all process again (deleting everything -> DNS records, AD records, Quorum share , Cluster) and start again... But I am not sure it would work...
Hope anyone can help,
Best Regards,
Jon