Hi everyone
I was doing some test on our sql cluster, and I've noticed a problem which causes the cluster log to report eventid 1069 and 1205.
My cluster configuration is as follow
3 HP DL380G7 (cluster-01, cluster-02, cluster-03)
Windows 2008R2 x64 on each of these server
I created a failover cluster of SQL Server
instance 1: IST01, preferred owner node cluster-01. failover on node cluster-02, cluster-03
instance 2: IST02, preferred owner node cluster-02, failover on node cluster-03, cluster-01
instance 3: IST03, preferred owner node cluster-03, failover on node cluster-01, cluster-02
Each server has 72GB memory, and each SQL server instance has set the maximum memory limit to 24GB (so, in the worst case, I can have all three instances on a single node, 24+24+24=72GB)
Every server uses iSCSI lun on our existing SAN.
As I said, I was doing some test, so I tried to move the IST01 from node 1 to node 2 to simulate a failover. Everything ok
I different solution (IST03 from node 3 to node 1, IST01 from node 1 to node 2 etc etc)
The problem arises when I try to move instance IST03 from node cluster-03 to cluster-02. I get 2 events in the cluster event log (eventid 1069 and 1025), the instance goes down for a couple of seconds and then it resumes on node cluster-03
EventID 1069 reports: "Cluster resource 'SQL Server (IST03)' in clustered service or application 'SQL Server (IST03)' failed."
EventID 1205 reports: "The Cluster service failed to bring clustered service or application 'SQL Server (IST03)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application."
I tried to search online, but I still haven't found a good explanation about these errors
My cluster pass with 100% success the cluster validation process, every update (both windows and sql server) are installed.
Every server has the same software installed (double checked bios & driver revision of every peripheral)
Anyone has a good explanation of these 2 EventID and possibily an idea where I can start to look to solve this problem?
Thanks for any help