Failover of Distributed AG not happening after upgrade from SQL 2016 to SQL 2017

May 30, 2019, 6:06 am

≫ Next: Alwayson Error/information in errorlog

≪ Previous: Large data warehouse recommendations for high availability and DR solutions?

Hi,

I am in a weird situation. I have a Distributed AG setup between London and Flint servers(2 London and 2 Flint). London is the global primary and Flint is the global secondary). I did the rolling upgrade on these servers. After the upgrade, AG in Flint (on which Distributed AG was created) does not failover to the other node while regular Ags on flint are failing over w/o any issue.

Moreover, Same distributed AG on the London servers can failover w/o any issue.

I have set Sync mode to SYNCHRONOUS COMMIT and failover mode to MANUAL. As soon as I change failover mode to AUTOMATIC, DAG can failover. Issue is with the MANUAL failover mode.

It shows error which is weird. Databases are SYNCHRONIZED still following error appears:-

ERR [RES] SQL Server Availability Group <xxx>: [hadrag] Failed to execute SQL command to perform the requested operation
ERR [RES] SQL Server Availability Group <xxx>: [hadrag] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The availability replica for availability group 'xxxx' on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group. If the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible
ERR [RES] SQL Server Availability Group <xxx>: [hadrag] data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online. (41142)
ERR [RES] SQL Server Availability Group <xxx>: [hadrag] Failed to execute availability group command stored procedure
INFO [RES] SQL Server Availability Group <xxx>: [hadrag] Disconnect from SQL Server

Please suggest what is wrong.

P.S. This behavior was not there when it was SQL SERVER 2016. It started happening only after upgrade to SQL SERVER 2017. Additionally, regular AGs on those servers are not causing any problem.

ravinder1483@gmail.com

↧

Alwayson Error/information in errorlog

November 17, 2015, 10:28 pm

≫ Next: Manual failover between forwarder and secondary replica fails after upgrading all replicas to SQL 2017 and when all replicas are SYNCHRONIZED

≪ Previous: Failover of Distributed AG not happening after upgrade from SQL 2016 to SQL 2017

Hi All,

I found lots of information re AlwaysOn Availability group in errorlog, so complain from end user yet, although those are information only, but I don't see why the database is trying to change roles so often?

2015-11-15 03:10:59.70 spid182s The availability group database "DatabaseA_SD019" is changing roles from "SECONDARY" to "SECONDARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 03:10:59.70 spid171s State information for database 'DatabaseA_SD002' - Hardended Lsn: '(0:0:0)' Commit LSN: '(0:0:0)' Commit Time: 'Jan 1 1900 12:00AM'

2015-11-15 09:09:00.56 spid218s The availability group database "DatabaseA " is changing roles from "SECONDARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 09:09:00.57 spid331s State information for database 'DatabaseA' - Hardended Lsn: '(24221:7322:1)' Commit LSN: '(24221:7320:2)' Commit Time: 'Nov 15 2015 2:08AM'

2015-11-15 09:10:07.07 spid130s The availability group database "DatabaseA " is changing roles from "RESOLVING" to "SECONDARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 09:10:07.07 spid93s State information for database 'DatabaseA' - Hardended Lsn: '(24221:7322:1)' Commit LSN: '(24221:7320:2)' Commit Time: 'Nov 15 2015 2:08AM'

2015-11-15 09:10:11.12 spid190s AlwaysOn Availability Groups connection with primary database established for secondary database 'DatabaseA' on the availability replica 'SERVER-SSQL-1A\INSTANCE1' with Replica ID: {ae6f87ff-6e47-40e3-a239-7c395a571b16}. This is an informational message only. No user action is required.

I am running on Windows 2012 R2 and SQL Server as:

Microsoft SQL Server 2014 (SP1-CU3) (KB3094221) - 12.0.4427.24 (X64)
Oct 10 2015 17:18:26
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

↧

Manual failover between forwarder and secondary replica fails after upgrading all replicas to SQL 2017 and when all replicas are SYNCHRONIZED

May 30, 2019, 6:06 am

≫ Next: Windows 2012 R2/SQL 2016/Block Storage - Availability Options

≪ Previous: Alwayson Error/information in errorlog

Hi,

I am in a weird situation. I have a Distributed AG setup between London and Flint servers(2 London and 2 Flint). London is the global primary and Flint is the global secondary). I did the rolling upgrade on these servers. After the upgrade, manual failover of the Forwarder to its Secondary(on which Distributed AG was created) fails while regular Ags are not showing this behavior.

I have set Sync mode to SYNCHRONOUS COMMIT and failover mode to MANUAL. As soon as I change failover mode to AUTOMATIC, DAG can failover. Issue is with the MANUAL failover mode.

It shows error which is weird. Databases are SYNCHRONIZED still following error appears:-

Please suggest what is wrong.

P.S. This behavior was not there when it was SQL SERVER 2016. It started happening only after upgrade to SQL SERVER 2017. Additionally, regular AGs on those servers are not causing any problem.

ravinder1483@gmail.com

↧

Windows 2012 R2/SQL 2016/Block Storage - Availability Options

January 19, 2017, 11:41 am

≫ Next: Does Distributed AG need windows clustering

≪ Previous: Manual failover between forwarder and secondary replica fails after upgrading all replicas to SQL 2017 and when all replicas are SYNCHRONIZED

Hello,

I have been tasked with building a Highly Available SQL server. This server will be the database used for a Citrix DaaS environment. Always on SQL is not necessarily required, but an extended outage is not acceptable.

I have 2 nodes: Node 1 and Node 2 - I have configured MS Fail over cluster on both Nodes.

I have available 500GB Block storage that has been presented to both nodes.

I have configured my cluster and gone through validation, but validation states: "No disks were found on which to perform cluster validation tests"

The disks are online in the Cluster management console.

My question, do I need to have a failover cluster if I am using block storage? My drive is a block storage volume that has its own life cycle management. Database files that are stored in block storage can easily be moved to another SQL Server in case of disaster.

Is there any advantage in using the recover ability of block storage and the availability of MS clustering?

Many Thanks,

↧

Does Distributed AG need windows clustering

May 31, 2019, 11:34 am

≫ Next: Where can you view if CREATE ANY DATABASE permission has been granted to Availability Group

≪ Previous: Windows 2012 R2/SQL 2016/Block Storage - Availability Options

Hi, Simple question - Does distributed AG need windows clustering?

------

From https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/prereqs-restrictions-recommendations-always-on-availability?view=sql-server-2017

"SQL Server 2016 introduces distributed availability groups. In a distributed availability group two availability groups reside on different clusters."

---

So looks like it does need windows clustering right?

---

Reason i ask is we have a main server where clustering is there & we have setup AG. Now, there is another stand-alone (no windows clustering) server which is to be used for reporting, but as it doesn't have windows clustering - we cannot setup AG between the transactional sql cluster & this stand alone reporting sql box with no windows clustering right?

---

just a yes or no with appropriate link will suffice, thanks.

↧

Where can you view if CREATE ANY DATABASE permission has been granted to Availability Group

May 31, 2019, 12:44 pm

≫ Next: Read only routing

≪ Previous: Does Distributed AG need windows clustering

We're attempting to create a validation step that ensures AutoSeeding will work as intended. I'm looking for the place I can query to determine if the below permission is in place for the Availability Group and can't seem to find it. It does not appear to be in the Server level permissions within SSMS nor in any of the AlwaysOn DMVs. Can anyone help?

ALTER AVAILABILITY GROUP [MY_AG] GRANT CREATE ANY DATABASE
GO

Andre Porter

↧

Read only routing

June 3, 2019, 6:48 am

≫ Next: Re-configuring network hardware in SQL Cluster and High Availability Cluster

≪ Previous: Where can you view if CREATE ANY DATABASE permission has been granted to Availability Group

Hi guys,

We have a 2 node's SQL 2016 AG, I have confiure the Read-only route and put both SQL servers in Readable secondary.

My understaing is that all of the select statements should go through the secondary node, but when I run the SQL profiler I can see that still all of the select statements go throuw the primery Node.

Application that we use dont have string value to put it at readonly or read intent only.

Could it this be the reson we see any slect goes through the secondary node?

Thanks

Shahin

↧

Re-configuring network hardware in SQL Cluster and High Availability Cluster

May 2, 2019, 9:10 am

≫ Next: Mssql Failover cluster instane is failing with error ' SQL Server crashed during startup'

≪ Previous: Read only routing

Hello All,

This is a backup/cluster/SQL resource question. My expertise is in general server administration and backups. I understand enough about SQL to give the appearance I know what I'm talking about, so please excuse any incorrect terminology.

Currently, our agent based backup software calls to the SQL Cluster IP (or AGL IP, if enabled) to initiate the backup stream. This means that all backup traffic traverses the 1Gbps NIC on the current-active node. This works - but it's slow and impacts prod. We acquired 10Gbps NICs and installed them to the physical cluster nodes. The plan was to leave Prod traffic on the 1Gbps NIC, and then force backup traffic over the 10Gbps NIC.
Due to limitations of the backups software, there is no way to force the backups to use those 10Gbps NICs - because they're not associated with the Cluster IP - which is how the backup software initiates the stream.

General setup of our SQL Environments:
Standard Cluster (no AGL) - 2 physical nodes; Win2012R2; SQL2014; Cluster IP and node IPs(1Gbps NIC) are on the same subnet
AGL Clusters - comprised of a standard cluster (cluster IP) and a SQL node at offsite DR location (10Gbps WAN) AGL IP resides in same subnet as Cluster IP and cluster node IPs.
*note - If these were stand-alone SQL servers, this wouldn't be an issue for the backup software. The clustering is what creates the issue.

After some internal discussion and discussions with the Backup Vendor, we're looking to simply take the IP from the 1Gbps NIC and assign it to the 10Gpbs Fiber NIC - Then disable the 1Gbps NIC. The result will be all prod and backup traffic using the 10Gbps connection. Of course, we'd do this in a controlled manner - stopping SQL services and applying to passive nodes, etc.

The questions/concerns: Is this a supported "upgrade?" What kind of trouble should I expect if I do this? Will SQL balk at this change?

Apologies if I've left this vague - I tried to keep it simple and to the point. Thank you for reading.

↧

Mssql Failover cluster instane is failing with error ' SQL Server crashed during startup'

June 3, 2019, 11:10 pm

≫ Next: Failover when cpu usage is greater than 60%

≪ Previous: Re-configuring network hardware in SQL Cluster and High Availability Cluster

I tried to setup mssql fail over cluster using the below document.

https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-shared-disk-cluster-configure?view=sql-server-2017

Before starting cluster am able to log in to the mssql server after cluster formation its not possible.

[root@node2 ~]# pcs status
Cluster name: cluster
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu May 30 17:08:30 2019
Last change: Thu May 30 17:03:57 2019 by root via cibadmin on node2

2 nodes configured
3 resources configured

Online: [ node2 node3 ]

Full list of resources:

Resource Group: NewLinFCIGrp
     nfs4       (ocf::heartbeat:Filesystem):    Started node3
     ipr        (ocf::heartbeat:IPaddr2):       Started node3
     FCIResource        (ocf::mssql:fci):       Stopped

Failed Actions:
* FCIResource_start_0 on node3 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
    last-rc-change='Tue Jun 4 06:51:05 2019', queued=0ms, exec=22531ms
* FCIResource_start_0 on node2 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
    last-rc-change='Thu May 30 17:03:58 2019', queued=0ms, exec=22392ms

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
----------------------------------------------------

Logs

17:03:57 node2 pengine[27910]: notice: * Start      FCIResource     (          node2 )
May 30 17:03:57 node2 pengine[27910]: notice: Calculated transition 25, saving inputs in /var/lib/pacemaker/pengine/pe-input-48.bz2
May 30 17:03:57 node2 crmd[27911]: notice: Initiating monitor operation FCIResource_monitor_0 on node3
May 30 17:03:57 node2 crmd[27911]: notice: Initiating monitor operation FCIResource_monitor_0 locally on node2
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_validate
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: Resource agent invoked with: monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: FCIResource monitor : 7
May 30 17:03:58 node2 crmd[27911]: notice: Result of probe operation for FCIResource on node2: 7 (not running)
May 30 17:03:58 node2 crmd[27911]: notice: Initiating start operation FCIResource_start_0 locally on node2
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_validate
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: Resource agent invoked with: start
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_start
May 30 17:03:58 node2 su: (to mssql) root on none
May 30 17:03:58 node2 systemd: Created slice User Slice of mssql.
May 30 17:03:58 node2 systemd: Started Session c5 of user mssql.
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: SQL Server started. PID: 16164; user: mssql; command: /opt/mssql/bin/sqlservr
May 30 17:03:58 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-FCIResource]; connection-timeout [20]; health-threshold [3]; action [start]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with virtual-server-name [FCIResource]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 to connect to the instance at localhost:1433
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 to connect to the instance at localhost:1433
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 to connect to the instance at localhost:1433
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 to connect to the instance at localhost:1433
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 to connect to the instance at localhost:1433
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:04 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:04 Attempt 6 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 to connect to the instance at localhost:1433
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 to connect to the instance at localhost:1433
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 to connect to the instance at localhost:1433
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 to connect to the instance at localhost:1433
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 to connect to the instance at localhost:1433
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 to connect to the instance at localhost:1433
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 to connect to the instance at localhost:1433
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 to connect to the instance at localhost:1433
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 to connect to the instance at localhost:1433
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 to connect to the instance at localhost:1433
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 to connect to the instance at localhost:1433
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 to connect to the instance at localhost:1433
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refusedMay 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 to connect to the instance at localhost:1433
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:19 node2 fci(FCIResource)[16110]: INFO: start: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:19 node2 fci(FCIResource)[16110]: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:20 node2 fci(FCIResource)[16110]: ERROR: SQL Server crashed during startup.
May 30 17:04:20 node2 fci(FCIResource)[16110]: INFO: FCIResource start : 1
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3 ]
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:SQL Server crashed during startup. ]
May 30 17:04:20 node2 crmd[27911]: notice: Result of start operation for FCIResource on node2: 1 (unknown error)
May 30 17:04:20 node2 crmd[27911]: notice: node2-FCIResource_start_0:41 [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3\nocf-exit-reason:SQL Server crashed during startup.\n ]
May 30 17:04:20 node2 crmd[27911]: warning: Action 9 (FCIResource_start_0) on node2 failed (target: 0 vs. rc: 1): Error
May 30 17:04:20 node2 crmd[27911]: notice: Transition aborted by operation FCIResource_start_0 'modify' on node2: Event failed
May 30 17:04:20 node2 crmd[27911]: notice: Transition 25 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-48.bz2): Complete
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: notice: * Recover    FCIResource     (          node2 )
May 30 17:04:20 node2 pengine[27910]: notice: Calculated transition 26, saving inputs in /var/lib/pacemaker/pengine/pe-input-49.bz2
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:20 node2 pengine[27910]: notice: * Move       nfs4            ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: * Move       ipr             ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: * Recover    FCIResource     ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: Calculated transition 27, saving inputs in /var/lib/pacemaker/pengine/pe-input-50.bz2
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation FCIResource_stop_0 locally on node2
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_validate
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: Resource agent invoked with: stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: ERROR: SQL Server is not running.
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: FCIResource stop : 0
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_stop_0:16703:stderr [ ocf-exit-reason:SQL Server is not running. ]
May 30 17:04:20 node2 crmd[27911]: notice: Result of stop operation for FCIResource on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation ipr_stop_0 locally on node2
May 30 17:04:20 node2 IPaddr2(ipr)[16751]: INFO: IP status = ok, IP_CIP=
May 30 17:04:20 node2 avahi-daemon[3000]: Withdrawing address record for 10.170.90.37 on enp0s3.
May 30 17:04:20 node2 crmd[27911]: notice: Result of stop operation for ipr on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation nfs4_stop_0 locally on node2
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Running stop for 10.170.90.37:/var/nfs/fci1 on /var/opt/mssql/data
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Trying to unmount /var/opt/mssql/data
May 30 17:04:26 node2 kernel: nfs: server 10.170.90.37 not responding, timed out
May 30 17:04:26 node2 Filesystem(nfs4)[16805]: INFO: unmounted /var/opt/mssql/data successfully
May 30 17:04:26 node2 crmd[27911]: notice: Result of stop operation for nfs4 on node2: 0 (ok)
May 30 17:04:26 node2 crmd[27911]: notice: Initiating start operation nfs4_start_0 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating monitor operation nfs4_monitor_20000 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating start operation ipr_start_0 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating monitor operation ipr_monitor_10000 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating start operation FCIResource_start_0 on node3
May 30 17:04:49 node2 crmd[27911]: warning: Action 10 (FCIResource_start_0) on node3 failed (target: 0 vs. rc: 1): Error
May 30 17:04:49 node2 crmd[27911]: notice: Transition aborted by operation FCIResource_start_0 'modify' on node3: Event failed
May 30 17:04:49 node2 crmd[27911]: notice: Transition 27 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-50.bz2): Complete
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node3 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]: notice: * Stop       FCIResource     (          node3 )   due to node availability
May 30 17:04:49 node2 pengine[27910]: notice: Calculated transition 28, saving inputs in /var/lib/pacemaker/pengine/pe-input-51.bz2
May 30 17:04:49 node2 crmd[27911]: notice: Initiating stop operation FCIResource_stop_0 on node3
May 30 17:04:50 node2 crmd[27911]: notice: Transition 28 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-51.bz2): Complete
May 30 17:04:50 node2 crmd[27911]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
May 30 17:09:19 node2 chronyd[3032]: Source 51.89.151.183 replaced with 129.250.35.251
May 30 17:10:01 node2 systemd: Started Session 51 of user root.

Please help...

↧

Failover when cpu usage is greater than 60%

June 4, 2019, 10:30 pm

≫ Next: SQL Server 2017 Fail-over Cluster License

≪ Previous: Mssql Failover cluster instane is failing with error ' SQL Server crashed during startup'

Is there a way to force failover when the cpu usage of primary database server is more than, say 60% or any value?

↧

SQL Server 2017 Fail-over Cluster License

December 6, 2018, 1:15 am

≫ Next: Always on add database to always on group failed with database encripted

≪ Previous: Failover when cpu usage is greater than 60%

Good day!

I have an SQL Server Failover Cluster Setup (2 nodes). When I installed SQL Server 2017, I used enterprise evaluation first. And then I clustered the SQL Servers. My problem now is I want to use my SQL License for standard edition and I'm trying to upgrade my sql servers from evaluation to standard edition. But I encountered this error:

Rule "SQL Server 2017 Failover Cluster Edition Downgrade" failed.

The edition of the selected SQL Server instance is not supported in this SQL Server Edition downgrade scenario. The source Evaluation edition and the target Standard edition is not supported path.

My question is what does this error means? Is standard edition license compatible with the SQL Failover Cluster?

I hope someone could help me with my problem.

↧

Always on add database to always on group failed with database encripted

June 5, 2019, 4:41 am

≫ Next: Availability groups or failover instance for small company

≪ Previous: SQL Server 2017 Fail-over Cluster License

Hi Expert

I am adding the DB to Always on AG group and ended up with following error. The Db ascertificate information as well . Any expert Idea to add. SQL version is 2016 SP1 CU7 . On Wizard the password is not allow to enter

↧

Availability groups or failover instance for small company

June 5, 2019, 7:52 am

≫ Next: Want to Build SQL Server 2016 with Active /Active Mode using Always On Feature

≪ Previous: Always on add database to always on group failed with database encripted

We have one SQL 2016 standard server running on one physical server. The database is crucial for our business and any downtime is unwanted. That’s why we planning to add an additional server for redundancy. The question is what approach to choose: basic availability groups or failover cluster instance? The main point is availability, a secondary one is performance.

Any advices would be appreciated.

↧

Want to Build SQL Server 2016 with Active /Active Mode using Always On Feature

June 5, 2019, 11:34 am

≫ Next: Alwayson - Flow Control and Synchronous replica

≪ Previous: Availability groups or failover instance for small company

Hi All,

Good Day!!!,

We wanted some clarification and reference links about SQL Server 2016 HA with Active/Active model using Always On or any other mode.

Recently we build SQL Server 2014 in AG but with this client can able to work on Active /Passive model

But in the similar ways customer wanted to setup SQL Server 2016 with Active /Active , so customer wanted to use both SQL Server at any point in time for client access

Example: If we take Exchange DAG , we can mount few DB's in one Mailbox Server and few in 2nd Mailbox server as active Copy . In this cause both Mailbox servers shares load.

In the similar way is there any possibility to configure SQL AG databases in such way that we need to make few database active {primary copy} in 1st server and few Database Active in 2nd Server, at the same time vice versa both server of at any point in time passive copy of all the DB's against primary DB's .

Any guidance / ref fence links very much appreciated.

Thanks in advance,

Regards, Kesavan K M. Please remember to mark the replies as answers if they help.

↧

Alwayson - Flow Control and Synchronous replica

June 6, 2019, 3:11 am

≫ Next: Unable to delete,truncate,alter,rename table

≪ Previous: Want to Build SQL Server 2016 with Active /Active Mode using Always On Feature

Please let me know if I have Primary and secondary replica in sync mode and Secondary went into Flow control because of overusage of resources, what impact will be on going transactions.. will they wait OR continue processing and send_queue will be built up on Primary replica?

Thank you

↧

Unable to delete,truncate,alter,rename table

June 6, 2019, 10:00 am

≫ Next: AlwaysOn with SQL Standard Licence data integrity?

≪ Previous: Alwayson - Flow Control and Synchronous replica

Hi ,

Unable to delete,truncate,alter and rename the table since dml changes are working fine.

Seems only dml statements are working whereas ddl is working.

↧

AlwaysOn with SQL Standard Licence data integrity?

January 21, 2019, 10:56 am

≫ Next: How to remove secondary IP address from Listener

≪ Previous: Unable to delete,truncate,alter,rename table

Hi everyone, I was reading the limitations of creating a basic always on availability group with SQL Standard Edition, and there is one that got me thinking.

No integrity checks on secondary replicas.

Does this mean that if I failover to the replica the data could be corrutped.

Also

Support for one availability database.

I suppose I can create multiple basic availability groups on the same server and in each one has one database.

Thanks a lot,

Ivan Mckenzie

↧

How to remove secondary IP address from Listener

October 10, 2016, 2:18 pm

≫ Next: Add a database in two availability groups

≪ Previous: AlwaysOn with SQL Standard Licence data integrity?

We are preparing a DR test scenario and went to remove an IP address from the DR subnet from the listener and add/delete buttons are greyed out. We tried with SA, Service Account, DBA accounts, no luck. We saw a post that said to delete the IP from the cluster manager for the listener, and from the registry. Is that really the only way??

John M. Couch

↧

Add a database in two availability groups

June 10, 2019, 12:58 am

≫ Next: Always on SPN

≪ Previous: How to remove secondary IP address from Listener

Can we add a database in two availability groups?

↧

Always on SPN

June 9, 2019, 8:46 pm

≫ Next: Always on : Primary database relocation

≪ Previous: Add a database in two availability groups

which FQDN should be register for SQL alwayson ?

the computer name + instance name

or the Alwayson group virtual name ?

↧