Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all 4532 articles
Browse latest View live

Getting error while validating Cluster

$
0
0

I installed SQL Server windows 2012 Datacenter, and SQL Server 2016 Enterprise edition

but getting an error 

<g class="gr_ gr_6 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="6" id="6">Node</g> cannot reach a writable domain controller. Please check connectivity of these nodes to the domain controllers.

also what IP do  I need to mention in below fig


SQL 2019 CTP: database doesn;t show up in sync after automatic failover

$
0
0

SQL version: SQL 2019 CTP

3 node sql cluster on Kubernetes

Microsoft link followed is https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-kubernetes-deploy?view=sqlallproducts-allversions

expected behaviour: after automatic failover (after killing one of the pod), it should create new pod with sql server instance and SSMS shows cluster heathy all database should be in sync 

problem: after killing pod, pod is getting created as expected, but database are not synchronizing after automatic failover. They are getting joined to availability group. If I login to pod through SSMS and join it , it failed at "update object explorer failed" . This is random on any of the secondary nodes.

I have checked the disk space and /var/opt/mssql is only 17 % reached

Changing the Session Timeout in an Availability Group.

$
0
0

Greetings. We been having issues with our 2012 AG lately and typically see messages like these below:

Message
A connection timeout has occurred on a previously established connection to availability replica '' with id [].  Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

Message
A connection timeout has occurred while attempting to establish a connection to availability replica '' with id []. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.

I actually didn't realize until today during research that there are actually two different (but similar) error messages. At any rate, changing the Session Timeout setting in the AG seems like a common recommendation once past the network/ firewall theories. My question is what are the potential negative impacts of doing this? I've been researching this topic for a while today and can't find any downsides to this. If it's OK to make them longer, why is a measly 10 seconds the default?

Ours are currently set to 30 seconds, but I'm wondering if they should be longer. My guess is it wouldn't hurt since our AG unfortunately houses a large Data Warehouse environment so there is a LOT of data getting transferred between these two nodes during heavy ETL times. 

Thanks!


Thanks in advance! ChrisRDBA

Failover cluster instance and Availability Group configuration.

$
0
0
Hi,
I would like to build the following combination of a Windows Server Failover Clustered SQL instance and an SQL availability Group.
Platforms = Windows Server 2016 Datacenter & SQL 2016
Site A contains SQL Node 1 of a stretched SQL WSFC Instance.
Site B contains SQL Node 2 of the SQL WSFC Instance.
Site C contains an additional file share witness for the WSFC
Windows 2016 Storage replication will handle the shared disk / replication requirements. (Latency is below 5ms between site A & Site B, Stretched VLAN is also in place.
Can I use an SQL availability group to replicate the database from the active SQL WSFC node to an additional SQL server in site C? (Async Manual commit mode due the distance to site C)
Can this be done with a basic availability group on SQL 2016 Standard edition?
If It does require SQL enterprise edition, does this configuration require more than one SQL 2016 enterprise edition license for the number of core in use + SA? (Assuming the Site C AG replica is not being used for backup’s, reporting etc).

Is the above a supported configuration?
TIA.

Extend on-prem AlwaysON AG to Azure

$
0
0

Hello all!

I am looking forward to get some guidance on how to extend my on-prem AO AG into Azure for the purpose of DR.

How do the steps usually go? I am using Azure ASR to failover to Azure and failback to on-prem.

My questions are -

1. When I want to "test failover", how can I achieve to see whether my on-prem app can connect to the Azure replica, without affecting the daily operations?

2. Will the copy on Azure be Synchronous or asynchronous? Can we use a sync copy if we're failing over through an Express Route (low latency)?

3. Will the replica automatically start replicating to my primary AG once it is up on prem? Is this affected by whether the Azure copy is sync or async?

I only have a basic understanding of SQL, so a feedback is much appreciated!

Cheers


Sam (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" wherever applicable. Thanks!) Blog:AnalyticOps Insights Twitter:Sameer Mhaisekar

SQL Agent state on AG secondaries- running or stopped

$
0
0
Noob question...What is the normal/default state of SQL Agent on an instance holding secondary AG... stopped or started?  If it is stopped, does it start automatically on a failover when the secondary becomes primary?  Any problem with Agent being left started on secondary HA replicas as long as there are no jobs that require they be run only on primary?  Thanks.

AlwaysOn AG :: SPN configuration

$
0
0

Hi, 

Per Microsoft information below, Do I need set SPN for the AG Listener name or Availability group name? 

For Example;

<g class="gr_ gr_392 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" data-gr-id="392" id="392">AG</g> name is <g class="gr_ gr_363 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="363" id="363">LIFETIMEPROD</g>-LIT007

<g class="gr_ gr_434 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" data-gr-id="434" id="434">AG</g> Listener is LIFETIME

Which one is correct? 

setspn -A MSSQLSvc/LIFETIMEPROD-LIT007:1433 corp/svclogin2

setspn -A MSSQLSvc/LIFETIME:1433 corp/svclogin2

----------------------------------------------

Availability Group Listeners and Server Principal Names (SPNs)

A Server Principal Name (SPN) must be configured in Active Directory by a domain administrator for each availability group listener name in order to enable Kerberos for the client connection to the availability group listener. When registering the SPN, you must use the service account of the server instance that hosts the availability <g class="gr_ gr_65 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" data-gr-id="65" id="65">replica .</g> For the SPN to work across all replicas, the same service account must be used for all instances in the WSFC cluster that hosts the availability group.

Use the <g class="gr_ gr_60 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="60" id="60">setspn</g> Windows command line tool to configure the SPN. For example to configure an SPN for an availability group named AG1listener.Adventure-Works.com hosted on a set of instances of SQL Server all configured to run under the domain account corp/svclogin2:

<button class="action" data-bi-name="copy" style="font-style:inherit;font-variant:inherit;font-weight:inherit;font-size:inherit;font-family:inherit;margin:0px;box-sizing:inherit;cursor:pointer;padding:2px 10px;border-width:0px 0px 0px 1px;display:flex;-webkit-box-align:center;align-items:center;">Copy</button>
setspn -A MSSQLSvc/AG1listener.Adventure-Works.com:1433 corp/svclogin2  

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/listeners-client-connectivity-application-failover?view=sql-server-2017#SPNs

Frequent Availability Group disconnects.

$
0
0

Greetings. 

I have a 2 node AG. SS 2012, SP 3/ CU 10

Synchronous commit, manual failover only.

I can get this message several times a day in the Event Viewer:

"AlwaysOn Availability Groups connection with primary database terminated for secondary database 'foo' on the availability replica with Replica ID:"

Followed then by the message:

"A connection for availability group 'myAG' from availability replica 'myPrimary' with id  [C940E91B-4D84-4006-8829-F7084DAB29C6] to 'mySecondary' with id [ADBC3978-17C5-4A98-A75C-0BA8FC2B2C34] has been successfully established"

Sometimes the disconnect and reconnect can even occur withing the same second. The only time they typically cause issues is when a backup is happening on the Secondary. The job will fail and we'll get paged. 

More fun facts:

  • Event ID for reconnections is 35202 – disconnects is 35267.
  • A disconnect can definitely occur when the CPU is very low.
  • There’s nothing useful in the Cluster Log for this.
  • This AG supports a large Data Warehouse environment (~ 20 TB). Mostly large batch jobs/ no OLTP. Yes, I realize an AG isn't ideal here, but it's what I've got.

Any ideas?


Thanks in advance! ChrisRDBA


SQL Server 2016 Availability Group DISCONNECTED replica

$
0
0

Hello,

i would like to ask you for help experts, because i have exhausted all options and ways i remembered or found.

We have 2 servers (each 64 CPU, 1 TB RAM, SSD disks) SQL 2016 (13.0.2164) deployed Availability group(asynchronnous mode). AG contains 8 databases (total DB size is +- 1TB). AG synchronization use prive network card 10Gbit.

Secondary replica irregularly change state to DISCONNECTED. I tried to solve it by stopping and starting endpoints, droping and creating endpoints. Only remove secondary replica and adding again solve the problem (but i had to restore all databases again).

Error log message :

A connection timeout has occurred on a previously established connection to availability replica 'abx' with id [xxx-xxx-xxx].  Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

Last connection error from sys.dm_hadr_availability_replica_states :

An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)'

There is on other error messages or messages related to this issue in Windows Log ect.

Endpoints are createdthis way

CREATE ENDPOINT [Hadr_endpoint]
    STATE=STARTED
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.20.10))
    FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

I tried stop/start endpoints, drop/create endpoints, disable endpoint encrytion, windows and certificate endpoint authentication, GRANT CONNECT permission to endpoints to logins, ALTER ATHORIZATION on endpoints for SQL Service account (both instances use same domain user account). Both servers listening on 5022 ports, firewall disabled, optical cable is used to connect both servers directly (withnout any network device between servers). Both servers has enought worker threads (max worker threads is 1472, primary replica use 1200 all time, secondary 600 worker threads)

I am realy desperate because DISCONNECT state can occur almost anytime. Sometimes hold everything ok for 1-2 days, sometimes secondary replica transit to DISCONNECTED after 2 hours from manualy remove/add/restore db/join db. When secondary replica is DISCONNECTED, transaction log of all databases growing.There is only one thing i can do and it is remove AG, remove cluster, reinstall both Windows 2012 servers (and drivers, firmware) and configure new cluster, SQL servers and AG.

If you have any idea what can i do to solve my problem, please, talk to me. All feedback is appreciated.

Thank you

David

Alwayson Error/information in errorlog

$
0
0

Hi All,

I found lots of information re AlwaysOn Availability group in errorlog, so complain from end user yet, although those are information only, but I don't see why the database is trying to change roles so often?

2015-11-15 03:10:59.70 spid182s    The availability group database "DatabaseA_SD019" is changing roles from "SECONDARY" to "SECONDARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 03:10:59.70 spid171s    State information for database 'DatabaseA_SD002' - Hardended Lsn: '(0:0:0)'    Commit LSN: '(0:0:0)'    Commit Time: 'Jan  1 1900 12:00AM'

2015-11-15 09:09:00.56 spid218s    The availability group database "DatabaseA " is changing roles from "SECONDARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 09:09:00.57 spid331s    State information for database 'DatabaseA' - Hardended Lsn: '(24221:7322:1)'    Commit LSN: '(24221:7320:2)'    Commit Time: 'Nov 15 2015  2:08AM'

2015-11-15 09:10:07.07 spid130s    The availability group database "DatabaseA " is changing roles from "RESOLVING" to "SECONDARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

2015-11-15 09:10:07.07 spid93s     State information for database 'DatabaseA' - Hardended Lsn: '(24221:7322:1)'    Commit LSN: '(24221:7320:2)'    Commit Time: 'Nov 15 2015  2:08AM'

2015-11-15 09:10:11.12 spid190s    AlwaysOn Availability Groups connection with primary database established for secondary database 'DatabaseA' on the availability replica 'SERVER-SSQL-1A\INSTANCE1' with Replica ID: {ae6f87ff-6e47-40e3-a239-7c395a571b16}. This is an informational message only. No user action is required.

I am running on Windows 2012 R2 and SQL Server as:

Microsoft SQL Server 2014 (SP1-CU3) (KB3094221) - 12.0.4427.24 (X64) 
Oct 10 2015 17:18:26 
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

WSFC --- Switching from shared drive quorum to File quorum

$
0
0

w are currently running two nodes WSFC 2012 and have sql server always on setup on it.

we need to replace share drive quorum with file quorum,

can we do this if so what is the best way to do so.

Regards


k


DR Site Online

$
0
0

I am preparing a document like how to bring DR server Online in the event of catastrophic.

Environment:

SQL Server 2016 Enterprise Edition

AlwaysOn Configured 3 Replicas (1 Primary,1 Secondary and 1 DR (Async) )

In this case how to bring DR Server online if Primary site is completely down?

Please provide some steps.

we are planning simulate this in Test environment.

SQL Server Multi-Site with Continuous Local Writes if Primary Fails

$
0
0

I wanted to setup Availability Groups with AlwaysOn, however, I'm not sure that would work.

Scenario: We have 4 sites, with 1 site having less than ideal internet reliability. The powers that be are wanting a setup where Site1 will be the primary site for all 4. Should for instance Site3 go offline due to internet connectivity etc. they still want Site3 to be able to work. Is that scenario supported and possible? 

Details I'd like:

  • Is the scenario above possible to do with AlwaysOn? If not what's recommended?
  • What happens to data once all sites are able to talk to each other again?

SQL 2014 and SQL 2016 AG restore steps best option for local and DR solutions.

$
0
0

I am wanting to restore a database to SQL 2014/2016.

Can you remove all user databases from a AG group and still have a viable AG group?

In others words, can we readd database to an empty AG group?

Also trying to figure out what is the best way to restore a database which is in an AG group which is three SQL nodes.

SQL server 1 = test db (primary)

SQL server 2 = test db (replica 1)   backups occurring on this replica

SQL server 3= test db (replica  2) 

If we want to do a restore do we need to remove all three test db from the AG,... before we restore all three test db?   Is this AG still viable when we remove all the databases from the AG Group? ( no data bases left in the AG group ).  We then restore the databases on three replicas then readd to the AG?  we would restore on the primary SQL server 1 then readd all three servers to the AG group,... will SQL server 2 and SQL server 3 just synchronize.

Or can do we remove the test db (primary) from the AG while making both replica 1 and replica 2 not synchronizing (Is there an option not to synchronize?)?   Next restore test db (primary) since it is removed from the AG.  Replica 1 and Replica 2 are set not to synch ( is this an option ) while still in the AG.  test db (primary) which has been  restored is added back to the AG then all three servers are set to synch.  

Or can we remove test db (primary ) from the AG.  Next remove from the AG and delete both test db (replica 1) and test db (replica 2). We now have an empty AG ( no servers ).  Do a restore on test db (primary) then add to the AG.  Will both SQL server 2 (replica 1) and SQL server 2 (replica 2) automatically recreate both these replicas if set to synchronize?

What is the best way to do the above??  Please provide detailed steps.

And if we have a AG which stretches across the WAN ( two different data centers ), how do we do a restore.  If data center A is primary and data center B is the DR how do we do a restore?   Please provide detailed steps.  Replication is asynch between the data centers.


dsk


AG Switching to Sync Mode

$
0
0

I have a alwayson cluster with 4 instances in ASYNC mode.. 2 Instances on SITE A and 2 on B. When there is large operation happens in primary, secondaries goes behind. Most of the time when its goes behind i see log send queue is 60 in replicas.But when i switch to SYNC mode,log send queue immediately populates to much larger value.

what is really happening when i change the mode from ASYNC to SYNC when secondary is behind? or vice versa.any doc tell about this?


Best Regards, Arun http://whynotsql.blogspot.com/


Stored Procedure Migraton with 100 Plus Applicatoins

$
0
0

Ok, so I feel for DBA's who do this regularly. It can be very overwhelming to say the least. Here is my question and scenario.

I'm attempting to work out an ETL process for 12 databases. All are the similar such as LIMS, LIMS-1, LIMS-2 etc. There are 12 of these. Seven have an identical schema, 4 have an identical schema but not identical to the first 7. Then 1 has a schema different that the other 11.

Migrating schemas probably isn't the challenge, however, I've never done this before so any tools to help would be great to know about. The problem comes in with stored procedures. There are over 100 applications that would use this new database (hence the problem). If there are stored procedures that have the same name, but operate on a different schema, what would be a way to handle this. I first thought about renaming the stored procs and then make changes to the application. However, my manager isn't wanting to go down that route. Thanks in advance.

Log shipping out of sync: tuf file missing

$
0
0

I dont see the tuf file for the transaction log shipping configured any more.
Also one of the tlog backup files is corrupt owing to file system difference. Let me know how to reintialize log shipping. I am taking backup of the Primary database, stopped the transaction log backup, stopped the copy and restore jobs. Do I have restore the backup say like below

RESTORE DATABASE PRISM FROM DISK='\\PRISM_FULL.BAK'
WITH NORECOVERY,
WITH STANDY='PRISM_200905210300.TUF'

After this what should be the course of action. Will the log shipping come to sync automatically?. If so how does the jobs recognize the new tlog files rather than looking for old tlog backup files??

 


Suman

Failed trying to get the state of the cluster node: "MSDTC Network Name"

$
0
0
I have two node SQL 2k8 R2 active-passive cluster & have only one instance is installed on my prod environment. Active node A contains only one sql server instance & MSDTC service & second node B is totally passive it doesn't hold any resource.
My setup, everything is working fine.
But just recently when i monitoring event viewer log, i found an below error related to sql server msdtc service.
Error: "Failed trying to get the state of the cluster node: "MSDTC_Network_Name"The error code returned: 0x80070005"

I am confused if everything is working fine then what is this error,
though its my prod environment i am worried about sql server it doesn't affect anything.

Please help me on this.

Cluster Active-Passive Licensing - SQL Server Std

$
0
0

Hello,

we are not sure about licensing an SQL Server 2016 Std Edition active-passive 2-node-cluster.

Is it appropiate to use 2 per server licenses + CAL?

Thank you in advance...

How do I apply patches in an passive active mode on SQL Server 2016 Standard edition.

$
0
0
 Looking for some best practices or general guide lines on doing this type of thing. Thanks In advance.
Viewing all 4532 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>