We have a weekly purge job that runs on a table in SQL Server running SQL Server 2012 with AlwaysOn High Availability. When the purge runs, we experience a situation where services connected to the database (using unrelated tables) experience wait times of 30+ seconds (causing the services to timeout and stop), all of HADR_SYN_COMMIT type.
Our purge process is a simple delete from table where date > certain time period, wrapped in a transaction with a commit. While it runs, we see spikes in I/O wait time (read and write), as well as a spike in log flushes/sec on primary and secondary. The read latency for the primary server peaks at 1 ms, and we see no blocking at this time. Our hdds on the servers are Fusion I/O disks that are local to each machine and dedicated to the database files for the instance.
Looking at the primary, we see the HADR_SYNC_COMMIT waits, while on the secondary, we see almost 0 wait time across the server, maxing at 1 ms for writelog wait type during the event.
I have asked my infrastructure team to review the network at this time. What should I be looking at in addition to this? I haven't found much troubleshooting information other than this, so I figured I would check with the experts on where to look next!
Any help would be greatly appreciated.
Thanks!