Hello Experts,
I have the AG with two nodes. (Server01 and Server02)
The primary replica was always in Server02 before. One day, I manually failed it over to Server01. After this, there were lots of lease expired error in error log. Meanwhile the too long IO request warnings were always along with lease expired.
Like following error occurred every day. Then I manually failed over AG to Server 02 again after 5 days. The lease expired error missed.
My questions:
- Does the IO warning trigger the lease expired?
- If not, what trigger the lease expired?
- The disk configuration in both two servers are same. Why the lease expired only happen in Server01?
Any ideas? Thanks in advance.
error log example:
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 323 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev1.ndf] in database id 2. The OS file handle is 0x0000000000001154. The offset of the latest long I/O is: 0x00000106210000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 353 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev4.ndf] in database id 2. The OS file handle is 0x0000000000001134. The offset of the latest long I/O is: 0x0000013f210000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 375 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdb.mdf] in database id 2. The OS file handle is 0x000000000000109C. The offset of the latest long I/O is: 0x0000012e7c0000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 360 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdb_msssql_2.ndf] in database id 2. The OS file handle is 0x0000000000001068. The offset of the latest long I/O is: 0x00000112cc0000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 335 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev6.ndf] in database id 2. The OS file handle is 0x00000000000011A8. The offset of the latest long I/O is: 0x0000013e290000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 359 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev5.ndf] in database id 2. The OS file handle is 0x00000000000011A4. The offset of the latest long I/O is: 0x000001259b0000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 370 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev2.ndf] in database id 2. The OS file handle is 0x0000000000001170. The offset of the latest long I/O is: 0x00000119d30000
2019-10-23 21:10:51.07 spid28s SQL Server has encountered 344 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\tempdev3.ndf] in database id 2. The OS file handle is 0x0000000000001140. The offset of the latest long I/O is: 0x00000136100000
2019-10-23 21:14:56.42 Server SQL Server hosting availability group 'SQLAG01' did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.
2019-10-23 21:14:56.42 Server The lease between availability group 'SQLAG01' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.
2019-10-23 21:14:56.42 Server Always On: The local replica of availability group 'SQLAG01' is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.
2019-10-23 21:14:56.42 Server The state of the local availability replica in availability group 'RIS-SQLAGPRD01' has changed from 'PRIMARY_NORMAL' to 'RESOLVING_NORMAL'. The state changed because the lease between the local availability replica and Windows Server Failover Clustering (WSFC) has expired. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.
2019-10-23 21:14:56.66 spid51 Always On: The local replica of availability group 'SQLAG01' is preparing to transition to the resolving role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required.
2019-10-23 21:14:57.44 spid51 Always On: The local replica of availability group 'SQLAG01' is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required.
2019-10-23 21:14:58.03 spid51 The state of the local availability replica in availability group 'SQLAG01' has changed from 'RESOLVING_NORMAL' to 'PRIMARY_PENDING'. The state changed because the availability group is coming online. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.
2019-10-23 21:14:58.07 Server The lease worker of availability group 'SQLAG01' is now sleeping the excess lease time (164110 ms) supplied during online. This is an informational message only. No user action is required.
2019-10-23 21:14:58.16 Server The state of the local availability replica in availability group 'SQLAG01' has changed from 'PRIMARY_PENDING' to 'PRIMARY_NORMAL'. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.