Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Sql server 2012 ha dr 24_hop_final
1. SQL Server 2012
High Availability and DR
Joey D’Antoni
SQL Saturday #118 Madison, WI
21 April 2011
2. About Me
• @jdanton on Twitter
• Principal Architect SQL Server, Comcast Cable
• Joedantoni.wordpress.com
• Videos and Blogs at SSWUG.org
• Vice President of the Philadelphia SQL Server User
Group
– SQL Saturday #121 Philadelphia—June 9th
3. Agenda
• SQL Server 2008 to 2012—What’s Changed in HA and
DR
• Geo-Clustering
• All about Availability Groups
4. Learning Objectives
• SQL Server HA and DR
• What’s involved in SQL Clustering
• How clustering and Availability Groups work
• What’s new in 2012 HA/DR
5. Licensing (What’s New)
• The Availability Group features will require the Enterprise
Edition of SQL Server
• The licensing model for SQL Enterprise Edition has
changed. Consult your friendly Microsoft sales
representative for more details
• AlwaysOn read-only replicas will need to be licensed
6. Windows Core Support
• No GUI version of Windows
• Allows for fewer patches
• Uses PowerShell and MMCs for support
8. High Availability (HA) and Disaster
Recovery (DR) Options in SQL 2008
• Backup and Recovery
• Failover Cluster Instances (FCI)
• Mirroring
• Log Shipping
• Replication
• SAN Replication*
• Virtualization*
9. High Availability (HA) and Disaster
Recovery (DR) Options in SQL Server 2012
• Backup and Recovery
• Failover Cluster Instances (FCI)
• Mirroring
• Availability Groups (2012)
• Log Shipping
• Replication
• SAN Replication*
• Virtualization*
10. What’s new in SQL Server 2012 HA/DR
• AlwaysOn Availability Groups
• SMB Support for Failover Cluster Instances
• Multi-subnet clustering is supported
• Flexible Failover
12. SQL Failover Clustering in 2008
• SQL Clustering required 1 subnet to be used across the
whole cluster
• Cluster failover is controlled by isAlive/looksAlive
processes, which check the SQL service and run
@@servername
13. SQL Failover Clustering in 2012
• Full support for geo-distributed clusters
• SMB Storage (File Shares) Supported for FCI
• Flexible failover model based on sp_server_diagnostics
• TempDB on Non-shared Disk Resource
– Makes PCI-based Solid State Drive an option
16. Understanding Quorum
• There are a several slides on this topic—it is critical!
– In a nutshell, you cluster has to be able to talk to itself to keep the
cluster service up in running
– This applies to both SQL Server Failover Cluster Instances and
AlwaysOn Availability Groups
17. Quorum
• Quorum is critical—contains master copy of the cluster’s
configuration
• Serves as a tiebreaker if network communications
between cluster nodes fail
• If Quorum fails—cluster is shut down until it’s restored
18. Quorum Models
• Node and Disk Majority (Default)
• Node Majority
• No Majority (Quorum Disk Only)
• Node and File Share Majority (Good for Geo Clusters)
19. Quorum Failure Tolerance
Number of Nodes 2 3 4 5 6 7
Node Majority 0 1 1 2 2 3
Node and Disk/File Share Majority 1 2 2 3 3 4
• Assuming Disk is Up Calculation is: Cluster Up = RoundUp(Total # of
Nodes/2)
• Assuming Disk is Down Calculation is: ClusterUp = RoundUp (Total # of
Nodes/2)-1
20. Why Do Clusters Failover?
• Initiated by failures
in hardware or
software
• Checked by
isAlive/LooksAlive
processes (in
2008R2 and below)
21. Flexible Failover—New for 2012
• Replaces looksAlive/isAlive functionality in SQL Clusters
(and is used for Availability Groups)
• Now runs sp_server_diagnostics
– Accepts two parameter
• HealthCheckTimeout (Default 60 sec/Minimum 15 sec)
• Failover Condition Level
22. Flexible Failover Policies for
Clusters
Level Condition Description
No automatic • Indicates that no failover or restart will be
0
failover or restart triggered automatically on any failure conditions.
Failover or restart
1 • SQL Server service is down.
on server down
• SQL Server instance is not responsive (Resource
Failover or restart
DLL cannot receive data from
2 on server
sp_server_diagnostics within the
unresponsive
HealthCheckTimeout settings).
Failover or restart
• System stored procedure sp_server_diagnostics
3 (Default) on critical server
returns ‘system error’. (Critical errors > 20)
errors
Failover or restart
• System stored procedure sp_server_diagnostics
4 on moderate server
returns ‘resource error’. (Moderate errors > 17)
errors
Failover or restart
• System stored procedure sp_server_diagnostics
5 on any qualified
returns ‘query_processing error’. (Deadlock)
failure conditions
25. Geo-Distributed Clustering
• Requires SAN replication ($$$$)
• Two of everything
• Requires really fast network connection
• Requires some trickery at the network/DNS level for
connectivity
• Witness Disk (Quorum)
– Can be physical (SAN) disk, or cluster file share
26. Geo-distributed Failover Clustering
• Was available in SQL 2008, but easier to implement in
2012
• Won’t be used by most organizations due to cost and
complexity
27. Review—DR Options in SQL 2008
• Mirroring
– Allowed automatic failover, but only one target
– Mirror target is unreadable
• Log Shipping
– Allowed multiple targets, but failover a manual process, requiring a
connection string change
• Replication
29. AlwaysOn Requirements
• Windows Enterprise (Clustering is a requirement)
• SQL Server Enterprise Edition
• Windows Cluster
• No shared storage is required
• Quorum Disk (File Share if multi-site or local storage)
32. Flexible AG Failover
• Similar to how a failover clustered instance fails over
• Connects to instance every 30 seconds to perform health
check
• Also, similar quorum model to Windows Failover
Clustering
34. Allows for SAN-Less HA/DR
• This is not a huge thing for SQL Server in larger
organizations, but big win for medium sized businesses
• Allows much easier native SQL DR in Virtual
Environments
35. Considerations for Availability Groups
• All SQL servers (including the secondary in the
DR site) in the same Windows domain
• All the databases must be in FULL recovery
model
• The unit of failover (for local HA, as well as DR)
is at the AG level, i.e., group of databases – not
the instance
36. Failover Scenarios
Synchronous- Synchronous-
Asynchronous- commit mode with commit mode with
commit mode manual-failover automatic-failover
mode mode
Automatic failover No No Yes
Manual failover No Yes Yes
Forced failover Yes Yes No
37. Read Only Replicas
• Can have up to 4 (1 synch, 3 asynch)
• SQL Client 2012 will allow for this routing specifically
• Can take backups from read-only copies*
– Copy Only Backups (only full copy, does not affect primary log)
– Can backup primary log from replica
• Indexing must be same on replicas
• Bad queries can affect status of replica
38. Read-only vs Read Intent
• Read only replica databases are open to any client that
can connect to SQL Server
• Read Intent routing is used for the Application Intent
functionality in the SQL 2012 client
• Read intent routing automatically directs connections to
either the primary or listener to a secondary replica
40. Client Connections in This Model
• Availability Group Listener
– Works just like a failover clustering instance (single
instance, single IP)
– Creates a VCO (AD Virtual Computer Object)—similar to a cluster
virtual object
• Read-only Connections
– Requires 2012 native ODBC client
41. Backups
• You can determine whether the current replica is the
preferred backup replica by calling the
sys.fn_hadr_backup_is_preferred_replica function
• This checks for replica status
• Allows for post-failover backup jobs to run unchanged in
the event of a failure
• Logic is:
If (top-priority replica is local) Run backup job
Else Exit with success
42. Client Connections
• Always specify Multi-Subnet Failover=True in listener
connection
• From Books Online
“will significantly reduce failover time
for single and multi-subnet AlwaysOn
topologies.”
• SQL Server Failover Cluster Instances as well
43. SQL Clusters and Always On
• SQL Failover Clusters can be members of an Availability
Group
• FCI can only be configured for manual failover
• Only one (the active) node can own the Always On
Replica
44. Differences—SQL FCI and Availability
Groups
Replicas within an availability
Nodes within an FCI
group
Uses WSFC cluster Yes Yes
Protection level Instance Database
Storage type Shared Non-shared
Direct attached, SAN, mount points,
Storage solutions Depends on node type
SMB
Readable secondaries No Yes
· WSFC quorum
· WSFC quorum
Applicable failover policy · FCI-specific
settings · Availability group settings
· Availability group settings3
Failed-over resources Server, instance, and database Database only
47. Summary
• Lots of Change in the HA/DR Space
• Licensing also changes—talk to your MS rep
• SQL Server Failover Clusters still a good HA option
• AlwaysOn Availability Groups add a lot more flexibility to
DR
ELS: Change order here to match previous slide better, and follow order of slides later on (I moved them):SQL Server HA and DR What’s new in 2012 HA/DRWhat’s involved in SQL ClusteringHow clustering and Availability Groups work
ELS: I think I would put the last bullet about mirroring on the next slide. To me, nothing changes about licensing for mirroring (right?), and it’s still available in Standard and Enterprise, right? If so, then I would classify it as a functionality “change” rather than licensingMirroring as a technology will be going away in a future version of SQL—so if you would like to have automatic DR, Standard edition will not be an option.
The reason why I have this in my HA/DR presentation is that Core will reduce the amount of patches that need to be applied to your servers. Without IE, and many other attack vectors, Microsoft expects the patches needed to be reduced by about 50%.
SQL Server clustering is the most obvious high availability solution that everyone knows about. However, mirroring between two SQL Servers (with a witness server) can also provide a level a both h/a and D/R. The other two options are a little bit more controversial and more complicated to setup. Both peer to peer replication and SQL Log Shipping can provide some measure of H/A, but there are caveats to this, and some data loss is possible. This is a little outside of the scope of this preso, so if you would like to know more detail around these topics, I highly recommend Paul Randal’s white paper on SQL HA and DR options. I’ll provide a link at the end of this presentation.ELS: This slide has High Availability spelled out, the next has HA. Make them consistent, either
DR Options—yes backup and recovery is your first line of defense in the event of a disaster. You should have extensive monitoring and notification around your backup process, and take regular transaction log backups, if you need point in time recovery.Mirroring is probably the best high availability option. With a witness server (a server that sits in between the two mirrors) you get automatic failover in the event of the failure of your primary instance goes down. Most applications that use Microsoft connections to your database can support mirroring. The only negative, is that unless you have enterprise edition, you are limited to synchronous mirroring, which can have a performance impact on your primary. Enterprise edition brings in asynchronous mirroring, which allows for greater flexibility and distance between sites with no performance impact.Log shipping and Replication—both of these will require manual intervention in the event of a failure. However, they are very mature technologies and can work over great distances. This is not a DR scenario, but I have an application which replicates from the US to Switzerland over a nominal network connection, running on SQL 2000, and I haven’t had to touch it in two years. (Knocks on wood).Lastly SAN replication—this is really cool technology, and can enable the concept of geo-distributed clusters (also covered in Paul’s white paper). This is pretty far out of scope for today’s presentation, but I’ll say this—while really cool, it’s really complex to setup, and really expensive. You need additional software from your SAN vendor, which is always pretty pricey, and the additional network bandwidth to transfer bits in real time over the network. When I was at Wyeth, we did this between Philadelphia and Pearl River NY for the SAP system that ran the business. But the cost made it prohibitive to do much else. Also, when it goes wrong, it can be ugly.
ELS Maybe change “Traditional” to be 2008, and note that it’s still an option in 2012
ELS: Change title to be like next one (Clustering in 2008)
Insert picture here
Mention DNS Time To Life value for cluster DNS name, this applies to both Ags and SQL FCI.
ELS Maybe change “Traditional” to be 2008, and note that it’s still an option in 2012
The amount of time that the database will be unavailable during a failover depends on the type of failover and its cause. For more information, see Estimate the Interruption of Service During Failover of an Availability Group (SQL Server). ImportantTo support client connections after failover, except for contained databases, logins and jobs defined on any of the former primary databases must be manually recreated on the new primary database. For more information, see Management of Logins and Jobs for the Databases of an Availability Group (SQL Server).
ELS: I moved this slide and the next one DOWN (moved Failover Modes and Failover Scenarios up)