Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all articles
Browse latest Browse all 4532

Mssql Failover cluster instane is failing with error ' SQL Server crashed during startup'

$
0
0
I tried to setup mssql fail over cluster using the below document.

https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-shared-disk-cluster-configure?view=sql-server-2017

Before starting cluster am able to log in to the mssql server after cluster formation its not possible.

[root@node2 ~]# pcs status
Cluster name: cluster
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu May 30 17:08:30 2019
Last change: Thu May 30 17:03:57 2019 by root via cibadmin on node2

2 nodes configured
3 resources configured

Online: [ node2 node3 ]

Full list of resources:

 Resource Group: NewLinFCIGrp
     nfs4       (ocf::heartbeat:Filesystem):    Started node3
     ipr        (ocf::heartbeat:IPaddr2):       Started node3
     FCIResource        (ocf::mssql:fci):       Stopped

Failed Actions:
* FCIResource_start_0 on node3 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
    last-rc-change='Tue Jun  4 06:51:05 2019', queued=0ms, exec=22531ms
* FCIResource_start_0 on node2 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
    last-rc-change='Thu May 30 17:03:58 2019', queued=0ms, exec=22392ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
----------------------------------------------------

Logs

 17:03:57 node2 pengine[27910]:  notice:  * Start      FCIResource     (          node2 )
May 30 17:03:57 node2 pengine[27910]:  notice: Calculated transition 25, saving inputs in /var/lib/pacemaker/pengine/pe-input-48.bz2
May 30 17:03:57 node2 crmd[27911]:  notice: Initiating monitor operation FCIResource_monitor_0 on node3
May 30 17:03:57 node2 crmd[27911]:  notice: Initiating monitor operation FCIResource_monitor_0 locally on node2
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_validate
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: Resource agent invoked with: monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: FCIResource monitor : 7
May 30 17:03:58 node2 crmd[27911]:  notice: Result of probe operation for FCIResource on node2: 7 (not running)
May 30 17:03:58 node2 crmd[27911]:  notice: Initiating start operation FCIResource_start_0 locally on node2
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_validate
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: Resource agent invoked with: start
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_start
May 30 17:03:58 node2 su: (to mssql) root on none
May 30 17:03:58 node2 systemd: Created slice User Slice of mssql.
May 30 17:03:58 node2 systemd: Started Session c5 of user mssql.
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: SQL Server started. PID: 16164; user: mssql; command: /opt/mssql/bin/sqlservr
May 30 17:03:58 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-FCIResource]; connection-timeout [20]; health-threshold [3]; action [start]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with virtual-server-name [FCIResource]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 to connect to the instance at localhost:1433
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 to connect to the instance at localhost:1433
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 to connect to the instance at localhost:1433
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 to connect to the instance at localhost:1433
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 to connect to the instance at localhost:1433
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:04 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:04 Attempt 6 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 to connect to the instance at localhost:1433
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 to connect to the instance at localhost:1433
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 to connect to the instance at localhost:1433
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 to connect to the instance at localhost:1433
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 to connect to the instance at localhost:1433
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 to connect to the instance at localhost:1433
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 to connect to the instance at localhost:1433
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 to connect to the instance at localhost:1433
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 to connect to the instance at localhost:1433
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 to connect to the instance at localhost:1433
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 to connect to the instance at localhost:1433
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 to connect to the instance at localhost:1433
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refusedMay 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 to connect to the instance at localhost:1433
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:19 node2 fci(FCIResource)[16110]: INFO: start: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:19 node2 fci(FCIResource)[16110]: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:20 node2 fci(FCIResource)[16110]: ERROR: SQL Server crashed during startup.
May 30 17:04:20 node2 fci(FCIResource)[16110]: INFO: FCIResource start : 1
May 30 17:04:20 node2 lrmd[27908]:  notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3 ]
May 30 17:04:20 node2 lrmd[27908]:  notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:SQL Server crashed during startup. ]
May 30 17:04:20 node2 crmd[27911]:  notice: Result of start operation for FCIResource on node2: 1 (unknown error)
May 30 17:04:20 node2 crmd[27911]:  notice: node2-FCIResource_start_0:41 [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3\nocf-exit-reason:SQL Server crashed during startup.\n ]
May 30 17:04:20 node2 crmd[27911]: warning: Action 9 (FCIResource_start_0) on node2 failed (target: 0 vs. rc: 1): Error
May 30 17:04:20 node2 crmd[27911]:  notice: Transition aborted by operation FCIResource_start_0 'modify' on node2: Event failed
May 30 17:04:20 node2 crmd[27911]:  notice: Transition 25 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-48.bz2): Complete
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]:  notice:  * Recover    FCIResource     (          node2 )
May 30 17:04:20 node2 pengine[27910]:  notice: Calculated transition 26, saving inputs in /var/lib/pacemaker/pengine/pe-input-49.bz2
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:20 node2 pengine[27910]:  notice:  * Move       nfs4            ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]:  notice:  * Move       ipr             ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]:  notice:  * Recover    FCIResource     ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]:  notice: Calculated transition 27, saving inputs in /var/lib/pacemaker/pengine/pe-input-50.bz2
May 30 17:04:20 node2 crmd[27911]:  notice: Initiating stop operation FCIResource_stop_0 locally on node2
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_validate
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: Resource agent invoked with: stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: ERROR: SQL Server is not running.
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: FCIResource stop : 0
May 30 17:04:20 node2 lrmd[27908]:  notice: FCIResource_stop_0:16703:stderr [ ocf-exit-reason:SQL Server is not running. ]
May 30 17:04:20 node2 crmd[27911]:  notice: Result of stop operation for FCIResource on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]:  notice: Initiating stop operation ipr_stop_0 locally on node2
May 30 17:04:20 node2 IPaddr2(ipr)[16751]: INFO: IP status = ok, IP_CIP=
May 30 17:04:20 node2 avahi-daemon[3000]: Withdrawing address record for 10.170.90.37 on enp0s3.
May 30 17:04:20 node2 crmd[27911]:  notice: Result of stop operation for ipr on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]:  notice: Initiating stop operation nfs4_stop_0 locally on node2
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Running stop for 10.170.90.37:/var/nfs/fci1 on /var/opt/mssql/data
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Trying to unmount /var/opt/mssql/data
May 30 17:04:26 node2 kernel: nfs: server 10.170.90.37 not responding, timed out
May 30 17:04:26 node2 Filesystem(nfs4)[16805]: INFO: unmounted /var/opt/mssql/data successfully
May 30 17:04:26 node2 crmd[27911]:  notice: Result of stop operation for nfs4 on node2: 0 (ok)
May 30 17:04:26 node2 crmd[27911]:  notice: Initiating start operation nfs4_start_0 on node3
May 30 17:04:27 node2 crmd[27911]:  notice: Initiating monitor operation nfs4_monitor_20000 on node3
May 30 17:04:27 node2 crmd[27911]:  notice: Initiating start operation ipr_start_0 on node3
May 30 17:04:27 node2 crmd[27911]:  notice: Initiating monitor operation ipr_monitor_10000 on node3
May 30 17:04:27 node2 crmd[27911]:  notice: Initiating start operation FCIResource_start_0 on node3
May 30 17:04:49 node2 crmd[27911]: warning: Action 10 (FCIResource_start_0) on node3 failed (target: 0 vs. rc: 1): Error
May 30 17:04:49 node2 crmd[27911]:  notice: Transition aborted by operation FCIResource_start_0 'modify' on node3: Event failed
May 30 17:04:49 node2 crmd[27911]:  notice: Transition 27 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-50.bz2): Complete
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node3 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]:  notice:  * Stop       FCIResource     (          node3 )   due to node availability
May 30 17:04:49 node2 pengine[27910]:  notice: Calculated transition 28, saving inputs in /var/lib/pacemaker/pengine/pe-input-51.bz2
May 30 17:04:49 node2 crmd[27911]:  notice: Initiating stop operation FCIResource_stop_0 on node3
May 30 17:04:50 node2 crmd[27911]:  notice: Transition 28 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-51.bz2): Complete
May 30 17:04:50 node2 crmd[27911]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
May 30 17:09:19 node2 chronyd[3032]: Source 51.89.151.183 replaced with 129.250.35.251
May 30 17:10:01 node2 systemd: Started Session 51 of user root.

Please help...






Viewing all articles
Browse latest Browse all 4532

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>