I tried to setup mssql fail over cluster using the below document.
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-shared-disk-cluster-configure?view=sql-server-2017
Before starting cluster am able to log in to the mssql server after cluster formation its not possible.
[root@node2 ~]# pcs status
Cluster name: cluster
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Thu May 30 17:08:30 2019
Last change: Thu May 30 17:03:57 2019 by root via cibadmin on node2
2 nodes configured
3 resources configured
Online: [ node2 node3 ]
Full list of resources:
Resource Group: NewLinFCIGrp
nfs4 (ocf::heartbeat:Filesystem): Started node3
ipr (ocf::heartbeat:IPaddr2): Started node3
FCIResource (ocf::mssql:fci): Stopped
Failed Actions:
* FCIResource_start_0 on node3 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
last-rc-change='Tue Jun 4 06:51:05 2019', queued=0ms, exec=22531ms
* FCIResource_start_0 on node2 'unknown error' (1): call=41, status=complete, exitreason='SQL Server crashed during startup.',
last-rc-change='Thu May 30 17:03:58 2019', queued=0ms, exec=22392ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
----------------------------------------------------
Logs
17:03:57 node2 pengine[27910]: notice: * Start FCIResource ( node2 )
May 30 17:03:57 node2 pengine[27910]: notice: Calculated transition 25, saving inputs in /var/lib/pacemaker/pengine/pe-input-48.bz2
May 30 17:03:57 node2 crmd[27911]: notice: Initiating monitor operation FCIResource_monitor_0 on node3
May 30 17:03:57 node2 crmd[27911]: notice: Initiating monitor operation FCIResource_monitor_0 locally on node2
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_validate
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: Resource agent invoked with: monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: mssql_monitor
May 30 17:03:57 node2 fci(FCIResource)[16071]: INFO: FCIResource monitor : 7
May 30 17:03:58 node2 crmd[27911]: notice: Result of probe operation for FCIResource on node2: 7 (not running)
May 30 17:03:58 node2 crmd[27911]: notice: Initiating start operation FCIResource_start_0 locally on node2
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_validate
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: Resource agent invoked with: start
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: mssql_start
May 30 17:03:58 node2 su: (to mssql) root on none
May 30 17:03:58 node2 systemd: Created slice User Slice of mssql.
May 30 17:03:58 node2 systemd: Started Session c5 of user mssql.
May 30 17:03:58 node2 fci(FCIResource)[16110]: INFO: SQL Server started. PID: 16164; user: mssql; command: /opt/mssql/bin/sqlservr
May 30 17:03:58 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-FCIResource]; connection-timeout [20]; health-threshold
[3]; action [start]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 fci-helper invoked with virtual-server-name [FCIResource]
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 to connect to the instance at localhost:1433
May 30 17:03:59 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:03:59 Attempt 1 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 to connect to the instance at localhost:1433
May 30 17:04:00 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:00 Attempt 2 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 to connect to the instance at localhost:1433
May 30 17:04:01 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:01 Attempt 3 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 to connect to the instance at localhost:1433
May 30 17:04:02 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:02 Attempt 4 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 to connect to the instance at localhost:1433
May 30 17:04:03 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:03 Attempt 5 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:03 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:04 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:04 Attempt 6 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 to connect to the instance at localhost:1433
May 30 17:04:05 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:05 Attempt 7 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 to connect to the instance at localhost:1433
May 30 17:04:06 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:06 Attempt 8 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 to connect to the instance at localhost:1433
May 30 17:04:07 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:07 Attempt 9 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 to connect to the instance at localhost:1433
May 30 17:04:08 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:08 Attempt 10 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:08 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 to connect to the instance at localhost:1433
May 30 17:04:09 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:09 Attempt 11 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 to connect to the instance at localhost:1433
May 30 17:04:10 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:10 Attempt 12 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 to connect to the instance at localhost:1433
May 30 17:04:11 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:11 Attempt 13 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 to connect to the instance at localhost:1433
May 30 17:04:12 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:12 Attempt 14 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 to connect to the instance at localhost:1433
May 30 17:04:13 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:13 Attempt 15 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:13 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 to connect to the instance at localhost:1433
May 30 17:04:14 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:14 Attempt 16 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 to connect to the instance at localhost:1433
May 30 17:04:15 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:15 Attempt 17 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 to connect to the instance at localhost:1433
May 30 17:04:16 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:16 Attempt 18 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refusedMay 30 17:04:17
node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 to connect to the instance at localhost:1433
May 30 17:04:17 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:17 Attempt 19 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 to connect to the instance at localhost:1433
May 30 17:04:18 node2 fci(FCIResource)[16110]: INFO: start: 2019/05/30 17:04:18 Attempt 20 returned error: Unresponsive or down Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsockopt: connection refused
May 30 17:04:18 node2 kernel: NFS: state manager: check lease failed on NFSv4 server 10.170.90.37 with error 13
May 30 17:04:19 node2 fci(FCIResource)[16110]: INFO: start: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:19 node2 fci(FCIResource)[16110]: ERROR: 2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3
May 30 17:04:20 node2 fci(FCIResource)[16110]: ERROR: SQL Server crashed during startup.
May 30 17:04:20 node2 fci(FCIResource)[16110]: INFO: FCIResource start : 1
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3 ]
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_start_0:16110:stderr [ ocf-exit-reason:SQL Server crashed during startup. ]
May 30 17:04:20 node2 crmd[27911]: notice: Result of start operation for FCIResource on node2: 1 (unknown error)
May 30 17:04:20 node2 crmd[27911]: notice: node2-FCIResource_start_0:41 [ ocf-exit-reason:2019/05/30 17:04:19 Instance health status 1 is at or below the threshold value of 3\nocf-exit-reason:SQL Server crashed during startup.\n ]
May 30 17:04:20 node2 crmd[27911]: warning: Action 9 (FCIResource_start_0) on node2 failed (target: 0 vs. rc: 1): Error
May 30 17:04:20 node2 crmd[27911]: notice: Transition aborted by operation FCIResource_start_0 'modify' on node2: Event failed
May 30 17:04:20 node2 crmd[27911]: notice: Transition 25 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-48.bz2): Complete
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: notice: * Recover FCIResource ( node2 )
May 30 17:04:20 node2 pengine[27910]: notice: Calculated transition 26, saving inputs in /var/lib/pacemaker/pengine/pe-input-49.bz2
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:20 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:20 node2 pengine[27910]: notice: * Move nfs4 ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: * Move ipr ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: * Recover FCIResource ( node2 -> node3 )
May 30 17:04:20 node2 pengine[27910]: notice: Calculated transition 27, saving inputs in /var/lib/pacemaker/pengine/pe-input-50.bz2
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation FCIResource_stop_0 locally on node2
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_validate
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: Resource agent invoked with: stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: mssql_stop
May 30 17:04:20 node2 fci(FCIResource)[16703]: ERROR: SQL Server is not running.
May 30 17:04:20 node2 fci(FCIResource)[16703]: INFO: FCIResource stop : 0
May 30 17:04:20 node2 lrmd[27908]: notice: FCIResource_stop_0:16703:stderr [ ocf-exit-reason:SQL Server is not running. ]
May 30 17:04:20 node2 crmd[27911]: notice: Result of stop operation for FCIResource on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation ipr_stop_0 locally on node2
May 30 17:04:20 node2 IPaddr2(ipr)[16751]: INFO: IP status = ok, IP_CIP=
May 30 17:04:20 node2 avahi-daemon[3000]: Withdrawing address record for 10.170.90.37 on enp0s3.
May 30 17:04:20 node2 crmd[27911]: notice: Result of stop operation for ipr on node2: 0 (ok)
May 30 17:04:20 node2 crmd[27911]: notice: Initiating stop operation nfs4_stop_0 locally on node2
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Running stop for 10.170.90.37:/var/nfs/fci1 on /var/opt/mssql/data
May 30 17:04:20 node2 Filesystem(nfs4)[16805]: INFO: Trying to unmount /var/opt/mssql/data
May 30 17:04:26 node2 kernel: nfs: server 10.170.90.37 not responding, timed out
May 30 17:04:26 node2 Filesystem(nfs4)[16805]: INFO: unmounted /var/opt/mssql/data successfully
May 30 17:04:26 node2 crmd[27911]: notice: Result of stop operation for nfs4 on node2: 0 (ok)
May 30 17:04:26 node2 crmd[27911]: notice: Initiating start operation nfs4_start_0 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating monitor operation nfs4_monitor_20000 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating start operation ipr_start_0 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating monitor operation ipr_monitor_10000 on node3
May 30 17:04:27 node2 crmd[27911]: notice: Initiating start operation FCIResource_start_0 on node3
May 30 17:04:49 node2 crmd[27911]: warning: Action 10 (FCIResource_start_0) on node3 failed (target: 0 vs. rc: 1): Error
May 30 17:04:49 node2 crmd[27911]: notice: Transition aborted by operation FCIResource_start_0 'modify' on node3: Event failed
May 30 17:04:49 node2 crmd[27911]: notice: Transition 27 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-50.bz2): Complete
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node3: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Processing failed start of FCIResource on node2: unknown error
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node2 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]: warning: Forcing FCIResource away from node3 after 1000000 failures (max=1000000)
May 30 17:04:49 node2 pengine[27910]: notice: * Stop FCIResource ( node3 ) due to node availability
May 30 17:04:49 node2 pengine[27910]: notice: Calculated transition 28, saving inputs in /var/lib/pacemaker/pengine/pe-input-51.bz2
May 30 17:04:49 node2 crmd[27911]: notice: Initiating stop operation FCIResource_stop_0 on node3
May 30 17:04:50 node2 crmd[27911]: notice: Transition 28 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-51.bz2): Complete
May 30 17:04:50 node2 crmd[27911]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
May 30 17:09:19 node2 chronyd[3032]: Source 51.89.151.183 replaced with 129.250.35.251
May 30 17:10:01 node2 systemd: Started Session 51 of user root.
Please help...