Building a Stretched Cluster using Virtual SAN 6.1

Building a Stretched Cluster with Virtual SAN
Rawlinson Rivera, VMware, Inc
Duncan Epping, VMware, Inc
STO5333
#STO5333

• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
CONFIDENTIAL 2

Agenda
CONFIDENTIAL 3
1 Introduction
2 Requirements and Architectural details
3 Configuring and operating a VSAN Stretched Cluster
4 Failure Scenarios
5 Interoperability

VMware Virtual SAN 6.1
Introduction to Stretched Clustering

Typical Use Cases For Virtual SAN Stretched Clusters
• Planned maintenance of
one site without any service
downtime
• Transparent to app owners
and end users
• Avoid lengthy approval
processes
• Ability to migrate
applications back after
maintenance is complete
Planned Maintenance
• Automated initiation of VM
restart or recovery
• Very low RTO for majority
of unplanned failures
• Allows users to focus on
app health after recovery,
not how to recover VMs
Automated Recovery
• Prevent service outages
before an impending
disaster (e.g. hurricane,
rising flood levels)
• Avoid downtime, not
recover from it
• Zero data loss possible if
you have the time
Disaster Avoidance
CONFIDENTIAL 5

Virtual SAN Stretch Cluster
CONFIDENTIAL 6
• Increases Enterprise availability and data protection
• Based on an Active – Active architecture
• Supported on both Hybrid and All-Flash architectures
• Enables synchronous replication of data between sites
Site A Site B
vSphere + Virtual SAN
Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD

CONFIDENTIAL 7
• Site-level protection with zero data loss and near-instantaneous recovery
• Virtual SAN Stretched Cluster can be scaled up to 15 nodes per-site
• Beneficial solution for disaster avoidance and planned maintenance
Site A Site B

Virtual SAN Stretched Cluster
Virtual SAN Clusters:
• Required a minimum of 3 fault domains to
allow tolerating a single failure
• Virtual Machine objects remain accessible
after a single fault domain failure
CONFIDENTIAL 8
Fault Domains
Fault Domain A Fault Domain B
Virtual SAN Datastore
Fault Domain C
Fault Domain A Fault Domain B Fault Domain C
Data WitnessData

Stretched Clusters:
• Provides similar availability with 2 active – active
fault domains plus a witness only fault domain
– Light-weight witness host needed only for quorum
– Virtual SAN 6.1 allows a single witness host in a third
fault domain
• Virtual Machine disk objects (VMDKs) remain
accessible after one data fault domain fails
CONFIDENTIAL 9
Fault Domains
Fault Domain A Fault Domain B
Virtual SAN Datastore
Fault Domain C
Fault Domain A Fault Domain B Fault Domain C
Data WitnessData
• Virtual SAN increases its availability
capabilities to:
– Rack failures
– Network failures
– Hardware failures
– Site failures

CONFIDENTIAL 10
• Virtual SAN cluster is formed across the 3 fault domains
• Witness fault domain is utilized for witness purposes ONLY, not running VMs!
• Availability Policy supported (FTT=1)
• Automated failover in the event of site failure
Site A Site C
Fault Domain A Fault Domain CFault Domain B
HDD SSD
active activewitness
Site B

Requirements and Architectural Details

Requirements
• Network
– Virtual SAN storage networking
– Virtual SAN witness networking
– vSphere and virtual machine networking
• Storage
– Virtual machine storage
– Witness appliance storage
• Compute
– Virtual SAN witness
– vSphere HA
– vSphere DRS
CONFIDENTIAL 12

Virtual SAN Stretched Cluster Networking Requirements
CONFIDENTIAL 13
HDDSSD
FD1
200 ms latency over 100 mbps
over L3 no multicast
< 5 ms latency over >10/20/40 Gbps over L2 with multicast
• Network Requirements between data fault domains/sites
– 10 Gbps connectivity or greater
– < 5 millisecond latency RTT
– Layer 2 or Layer 3 network connectivity with multicast
• Network Requirements to witness fault domain
– 100 Mbps connectivity
– 200 milliseconds latency RTT
– Layer 3 network connectivity without multicast
• Network bandwidth requirements calculated based on write
operations between fault domains
– Kbps= (Nodes * Writes * 125)
– Deployment of 5+5+1 and ~300 VM would be ~4Gbps
FD3FD2
HDDSSDHDDSSD
200 ms latency over 100 mbps
over L3 no multicast

Virtual SAN Witness Appliance Overview
and Storage Requirements
CONFIDENTIAL 14
Witness overview and requirements
• Witness appliance:
– ONLY supported with Stretched Cluster *
– ONLY stores meta-data NOT customer data
– is not able to host any virtual machines
– can be re-created in event of failure
• Appliance requirements:
– at least three VMDK’s
– Boot disks for ESXi requires 20GB
– Capacity tier requires 16MB per witness component
– Caching tier is 10% of capacity tier
– Both tiers on witness could be on MDs
• The amount of storage on the witness is related to number of
components on the witness
Witness Appliance
vESXi

Virtual SAN Witness Appliance
Sizing Requirements
CONFIDENTIAL 15
Resource Requirements
• Large scale (15+15+1) – Max 3000 VMs and 18000 components on the
witness
– Memory: 32 GB
– CPU: 2vCPU
– Storage: 350 GB for capacity and 10GB for caching tier
• Medium (4+4+1) – Max 800 VMs & ~5000 Components on the witness
– Memory: 16 GB
– CPU: 2vCPU
• Storage: 50 GB for capacity and 5GB for caching tier
– Small (1+1+1) – Max 200 VMs & 1200 Component on the witness
– Memory: 16 GB
– CPU: 2vCPU
– Storage: 20 GB for capacity and 3GB for caching tier
OR
Data Center
vCloud Air
vESXi

Virtual SAN Witness Appliance
Network Requirements
CONFIDENTIAL 16
Witness network requirements
• Network communication
– between the witness and main sites is L3 (IP based) and
no multicast requirement!
– Witness node was optimized to receive minimal
metadata traffic
– Read and write operations do not require any
communication to the witness
– Traffic is mostly limited to metadata updates
– Must not be route communication through the witness
site
– Heartbeat between the witness and other fault domains
happens once a second.
– After 5 consecutive failures the communication is
declared failed
HDDSSD HDDSSD
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
vESXi

Virtual SAN Stretched Clusters –
Supported Deployment Scenarios
CONFIDENTIAL
17
Complete Layer 3 Deployment
Stretched L2 with multicast
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
Layer 3 Network Layer 3 Network
Traditional Layer 3 and Layer 2 Deployment
routerrouter
router
Layer 3 Network
router router
Layer 3 Network
3rd party solution to manage
VM networks required
vESXi vESXi

Virtual SAN Stretched Cluster –
Supported Storage Policies
CONFIDENTIAL 18
• Maximum supported “FailuresToTolerate” is 1 due to the support of only 3 fault domains
– “FailuresToTolerate=1” object will be implicitly “forceProvisioned” when only two of the
three sites are available
– Compliance will be fixed for such objects once the third site becomes available
Fault Domain A
Active
Fault Domain C
Active
Stretched Cluster
HDDSSD HDDSSD HDDSSD
HDDSSD HDDSSD HDDSSD
Witness
Fault Domain B
HDD SSD

Virtual SAN Stretched Clusters – Preferred, Non-preferred Sites
CONFIDENTIAL 19
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
Preferred
• Preferred fault domain or site is one of the two active -
active fault domains
• The Preferred fault domain or site could be changed
dynamically
• One of the active data sites is designated as “preferred”
fault domain
– Required to handle the “split-brain” scenario - link
failure between the active sites
– Determines which active site the witness joins
Preferred
major partition
Non-Preferred
vESX
i

Stretched Clusters and Read Locality
20
Read Locality
• A VM will be running in (at most) one site
• FTT=1 implies that there are two copies of the data, one
in each site
• Reads will be served from the copy of the data that
resides on the same site as where the VM runs
• If the VM moves to the other site, then reads will be
served from the (consistent) copy of the data in the new
site
HDDSSD HDDSSD
Witness
Appliance
FD3
Read Operation
FD1 FD2
L2 with multicast
vESX
i
CONFIDENTIAL

Stretched Clusters and Writes
21
Writes
• There is no locality for writes, availability over
performance!
• Writes must be acknowledged from both sites before we
ACK to the application
• A typical write operation does not include any
communication to the witness
HDDSSD HDDSSD
Witness
Appliance
FD3
Write Operation
FD1 FD2
L2 with multicast
vESX
i
CONFIDENTIAL

Configuring and operating a Stretched Cluster

Configuring VMware Virtual SAN Stretched Cluster
• Simple configuration procedure
• Necessary L3 and L2 with multicasts network connectivity and configuration should be
completed before setting up stretched cluster
CONFIDENTIAL 23
Configure Fault
Domains
Select Witness
Host
Create Disk
Groups on
Witness
Validate health
of stretched
cluster
configuration

DEMO
24
Austin, TX Dallas, TX
witness
Plano, TX
5 ms latency over 10 gbps
L2 with Multicast
Dell Switches S6000-ON ToR
FX2 IO Modules - FN410S
Active Active
Stretched Cluster
SSD SSD SSDSSD SSD SSD SSD SSD SSDSSD SSD SSD
Dell PowerEdge FX2 Dell PowerEdge FX2
vESXi
Dell Switches S6000-ON ToR
FX2 IO Modules - FN410S
CONFIDENTIAL

Configuring VMware Virtual SAN Stretched Cluster
• Health Check includes additional checks for stretched cluster:
– Witness host configuration
– Network configuration
– Host compatibility
– Fault domain configuration
CONFIDENTIAL 25
To configure stretched cluster go to > cluster > manager tab > Fault Domains > click on icon to start wizard

What about vSphere HA & DRS?
26
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
HA and DRS Behavior
• HA/DRS will not use the witness host as a target since the
witness is a standalone host in the VC and it will appear to
be an “incompatible” target
• HA failover
• If one site partitions away or fails, all Virtual SAN
objects will become inaccessible in that partition
• HA will failover the VMs running in that site to the
other active data site
• DRS will treat it as a normal cluster, migrations can
happen across sites
vESX
i
CONFIDENTIAL

vSphere HA Recommendations
• Make sure to set aside 50% of resources using
Admission Control!
– Admission control is not resource management
– Only guarantees power-on
• Enable “Isolation Response”
– “Power Off” recommended response
• Manually specify multiple isolation addresses
– One for each site using
“das.isolationaddressX”
– Disable the default gateway using
“das.useDefaultIsolationAddress=false”
• Make sure vSphere HA respects VM Host affinity
rules during failover!
27CONFIDENTIAL

vSphere DRS Recommendations
• Enable DRS, you want your VMs happy
• Remember Read Locality? Setup VM/Host
Affinity rules
– DRS will only migrate VMs to hosts which belong to
the VM/Host group
– Avoid “must rules” as they can bite you
– Use “should rules”, HA can respect these as of
vSphere 6.0!
– HA is smart and will go for “availability” over “rule
compliance”
28CONFIDENTIAL

Virtual SAN Stretched Clusters – Maintenance
CONFIDENTIAL 29
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
Stretched Cluster supported policies
• On the witness node host/disk/disk group maintenance
mode is only allowed the “NoAction” mode
• In the UI the witness host is a standalone host and
by default only “NoAction” is supported for stand-
alone hosts. So no change in behavior
• Default mode for API has been modified to be
“NoAction”
• If disks on the witness node are decommissioned,
objects will lose compliance. CLOM crawler will fix
the compliance by rebuilding the witnesses
• For all other hosts in the cluster - “Enter maintenance
mode” is supported in all 3 modes
vESX
i

Virtual SAN 6.1
Stretched Cluster Failure Scenarios

Face Your Fears, Test Failover Scenarios!
• Data Site Partition
• Data Site Failure
• Witness Site Failure
• Witness network failure (1 site)
• Site Failure that hosts vCenter Server
• Host Isolation or Failure
CONFIDENTIAL 31
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast

Failure Scenarios – Network Partition Between Data Sites
CONFIDENTIAL 32
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
Failure Scenario A
• What if there is a network partition between the two active
data sites aka “split brain scenario”?
• Witness always forms a cluster with the “preferred”
site in such a case and that is the partition that will
make progress
• This means that VMs in the “non-preferred” site will
lose access with storage
• If the HA network (most likely) is also isolated, then
VMs in the “non-preferred” site will be restarted in the
preferred site
• HA does not know what happened to the host in
the non-preferred site!
Preferred Non-Preferred
HA Restart

Failure Scenarios – Full Site Failure
CONFIDENTIAL 33
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
What if one active site fails?
• Since both active sites will have a copy of the data, the
second site can transparently take over
• Preferred or non-preferred makes no difference here
• Impacted VMs by full site failure will be restarted by
vSphere HA
• Site failure is detected if it misses the heartbeat for 5
consecutive times
• The heartbeat is sent every second
• Customers could continue creating VMs, etc. but they will
be out of compliant if FTT=1 (Force provisioning is added
by default)
• What would happen once the site comes back? This is
automatically detected which starts the re-sync of changed
data. Once re-sync is done then customer should use
DRS to distribute the VMs.
• Ideally you want all the nodes to be back online at the
same time
Preferred Non-Preferred
HA Restart
vCenter Server

Failure Scenarios – Witness Site Failure
CONFIDENTIAL 34
HDDSSD HDDSSD
Witness
Appliance
FD3
FD1 FD2
L2 with multicast
Details
• Witness failure is detected if it misses the heartbeat for 5
consecutive times
• The heartbeat is sent every second by both master and
backup
• If witness fails then there will be no disruption to IO
traffic for VMs
• VMs will continue running with no interruption since the
two main sites could create a quorum
• One can create a completely new witness and connect it
to the cluster
• What would happen once the witness comes back? We
will communicate all the meta-data (For all the objects in
the cluster) to the witness and cluster becomes healthy

Interoperability

Virtual SAN Stretched Cluster with vSphere Replication and SRM
CONFIDENTIAL 36
• Live migrations and automated HA restarts between stretched cluster sites
• Replication between Virtual SAN datastores enables RPOs as low as 5 minute
• 5 minutes RPO is exclusively available to Virtual SAN 6.x
• Lower RPO’s are achievable due to Virtual SAN’s efficient vsanSparse snapshot mechanism
• SRM does not support standalone Virtual SAN with one vCenter Server
Any distance >5 min RPOsite a
Stretched Cluster
< 5 ms latency over >10/20/40 gbps
Active Active
site b
L2 with Multicast
site x
VR
DR
vCenter
vCenter
witness
appliance
SRM
SRM

DR orchestration for vCloud Air DR
CONFIDENTIAL 37
Single-click recovery of on-premises applications in the cloud
Roadmap
Overview
• Multi-VM recovery plans to define
application/site recovery procedures
• Easy to use workflows for DR testing,
DR failover and failback
• Graceful migration workflows to ensure
no data loss before planned downtime
• Drastically reduce RTOs when
recovering multiple applications or
entire site workloads
Stretched cluster
VCD – vCloud Air

2-Node Remote Office Branch Office Solution
CONFIDENTIAL 38
Centralized Data Center
CentrallymanagedbyonevCenterServer
38
ROBO1
HDDSSD HDDSSD
vSphere + Virtual
SAN
HDDSSD HDDSSD
vSphere + Virtual
SAN
HDDSSD HDDSSD
vSphere + Virtual
SAN
witness
witness
witness
vESXi
appliance
vESXi
appliance
vESXi
appliance
ROBO2
ROBO3
vCenter Server
Overview
• Extension of Virtual SAN Stretched Cluster solution
• Each of the node will be in a Fault Domain (FD)
• One witness per Virtual SAN cluster
• 500ms Latency tolerated!
• Witness node is an ESXi appliance (VM)
• All sites managed centrally by one vCenter
• Patching and software upgrades performed
centrally through vCenter
• If there are N ROBOs then there will be
N witness VMs

Building a Stretched Cluster using Virtual SAN 6.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Building a Stretched Cluster using Virtual SAN 6.1

Similar to Building a Stretched Cluster using Virtual SAN 6.1 (20)

Recently uploaded

Recently uploaded (20)

Building a Stretched Cluster using Virtual SAN 6.1

Editor's Notes