SlideShare a Scribd company logo
1 of 42
Building a Stretched Cluster with Virtual SAN
Rawlinson Rivera, VMware, Inc
Duncan Epping, VMware, Inc
STO5333
#STO5333
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
CONFIDENTIAL 2
Agenda
CONFIDENTIAL 3
1 Introduction
2 Requirements and Architectural details
3 Configuring and operating a VSAN Stretched Cluster
4 Failure Scenarios
5 Interoperability
VMware Virtual SAN 6.1
Introduction to Stretched Clustering
Typical Use Cases For Virtual SAN Stretched Clusters
• Planned maintenance of
one site without any service
downtime
• Transparent to app owners
and end users
• Avoid lengthy approval
processes
• Ability to migrate
applications back after
maintenance is complete
Planned Maintenance
• Automated initiation of VM
restart or recovery
• Very low RTO for majority
of unplanned failures
• Allows users to focus on
app health after recovery,
not how to recover VMs
Automated Recovery
• Prevent service outages
before an impending
disaster (e.g. hurricane,
rising flood levels)
• Avoid downtime, not
recover from it
• Zero data loss possible if
you have the time
Disaster Avoidance
CONFIDENTIAL 5
Virtual SAN Stretch Cluster
CONFIDENTIAL 6
• Increases Enterprise availability and data protection
• Based on an Active – Active architecture
• Supported on both Hybrid and All-Flash architectures
• Enables synchronous replication of data between sites
Site A Site B
vSphere + Virtual SAN
Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD
Virtual SAN Stretch Cluster
CONFIDENTIAL 7
• Site-level protection with zero data loss and near-instantaneous recovery
• Virtual SAN Stretched Cluster can be scaled up to 15 nodes per-site
• Beneficial solution for disaster avoidance and planned maintenance
Site A Site B
vSphere + Virtual SAN
Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD
Virtual SAN Stretched Cluster
Virtual SAN Clusters:
• Required a minimum of 3 fault domains to
allow tolerating a single failure
• Virtual Machine objects remain accessible
after a single fault domain failure
CONFIDENTIAL 8
Fault Domains
Fault Domain A Fault Domain B
Virtual SAN Datastore
Fault Domain C
Fault Domain A Fault Domain B Fault Domain C
Data WitnessData
Virtual SAN Stretched Cluster
Stretched Clusters:
• Provides similar availability with 2 active – active
fault domains plus a witness only fault domain
– Light-weight witness host needed only for quorum
– Virtual SAN 6.1 allows a single witness host in a third
fault domain
• Virtual Machine disk objects (VMDKs) remain
accessible after one data fault domain fails
CONFIDENTIAL 9
Fault Domains
Fault Domain A Fault Domain B
Virtual SAN Datastore
Fault Domain C
Fault Domain A Fault Domain B Fault Domain C
Data WitnessData
• Virtual SAN increases its availability
capabilities to:
– Rack failures
– Network failures
– Hardware failures
– Site failures
Virtual SAN Stretch Cluster
CONFIDENTIAL 10
• Virtual SAN cluster is formed across the 3 fault domains
• Witness fault domain is utilized for witness purposes ONLY, not running VMs!
• Availability Policy supported (FTT=1)
• Automated failover in the event of site failure
Site A Site C
vSphere + Virtual SAN
Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD
Fault Domain A Fault Domain CFault Domain B
HDD SSD
active activewitness
Site B
Virtual SAN Stretched Cluster
Requirements and Architectural Details
Requirements
• Network
– Virtual SAN storage networking
– Virtual SAN witness networking
– vSphere and virtual machine networking
• Storage
– Virtual machine storage
– Witness appliance storage
• Compute
– Virtual SAN witness
– vSphere HA
– vSphere DRS
CONFIDENTIAL 12
Virtual SAN Stretched Cluster Networking Requirements
CONFIDENTIAL 13
HDDSSD
vSphere + Virtual SAN
FD1
200 ms latency over 100 mbps
over L3 no multicast
< 5 ms latency over >10/20/40 Gbps over L2 with multicast
• Network Requirements between data fault domains/sites
– 10 Gbps connectivity or greater
– < 5 millisecond latency RTT
– Layer 2 or Layer 3 network connectivity with multicast
• Network Requirements to witness fault domain
– 100 Mbps connectivity
– 200 milliseconds latency RTT
– Layer 3 network connectivity without multicast
• Network bandwidth requirements calculated based on write
operations between fault domains
– Kbps= (Nodes * Writes * 125)
– Deployment of 5+5+1 and ~300 VM would be ~4Gbps
FD3FD2
HDDSSDHDDSSD
200 ms latency over 100 mbps
over L3 no multicast
Virtual SAN Witness Appliance Overview
and Storage Requirements
CONFIDENTIAL 14
Witness overview and requirements
• Witness appliance:
– ONLY supported with Stretched Cluster *
– ONLY stores meta-data NOT customer data
– is not able to host any virtual machines
– can be re-created in event of failure
• Appliance requirements:
– at least three VMDK’s
– Boot disks for ESXi requires 20GB
– Capacity tier requires 16MB per witness component
– Caching tier is 10% of capacity tier
– Both tiers on witness could be on MDs
• The amount of storage on the witness is related to number of
components on the witness
Witness Appliance
vESXi
Virtual SAN Witness Appliance
Sizing Requirements
CONFIDENTIAL 15
Resource Requirements
• Large scale (15+15+1) – Max 3000 VMs and 18000 components on the
witness
– Memory: 32 GB
– CPU: 2vCPU
– Storage: 350 GB for capacity and 10GB for caching tier
• Medium (4+4+1) – Max 800 VMs & ~5000 Components on the witness
– Memory: 16 GB
– CPU: 2vCPU
• Storage: 50 GB for capacity and 5GB for caching tier
– Small (1+1+1) – Max 200 VMs & 1200 Component on the witness
– Memory: 16 GB
– CPU: 2vCPU
– Storage: 20 GB for capacity and 3GB for caching tier
OR
Data Center
vCloud Air
vESXi
Virtual SAN Witness Appliance
Network Requirements
CONFIDENTIAL 16
Witness network requirements
• Network communication
– between the witness and main sites is L3 (IP based) and
no multicast requirement!
– Witness node was optimized to receive minimal
metadata traffic
– Read and write operations do not require any
communication to the witness
– Traffic is mostly limited to metadata updates
– Must not be route communication through the witness
site
– Heartbeat between the witness and other fault domains
happens once a second.
– After 5 consecutive failures the communication is
declared failed
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
vESXi
Virtual SAN Stretched Clusters –
Supported Deployment Scenarios
CONFIDENTIAL
17
Complete Layer 3 Deployment
Stretched L2 with multicast
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
Layer 3 Network Layer 3 Network
Traditional Layer 3 and Layer 2 Deployment
routerrouter
router
Layer 3 Network
router router
Layer 3 Network
3rd party solution to manage
VM networks required
vESXi vESXi
Virtual SAN Stretched Cluster –
Supported Storage Policies
CONFIDENTIAL 18
• Maximum supported “FailuresToTolerate” is 1 due to the support of only 3 fault domains
– “FailuresToTolerate=1” object will be implicitly “forceProvisioned” when only two of the
three sites are available
– Compliance will be fixed for such objects once the third site becomes available
Fault Domain A
Active
Fault Domain C
Active
vSphere + Virtual SAN
Stretched Cluster
HDDSSD HDDSSD HDDSSD
HDDSSD HDDSSD HDDSSD
Witness
Fault Domain B
HDD SSD
Virtual SAN Stretched Clusters – Preferred, Non-preferred Sites
CONFIDENTIAL 19
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Preferred
• Preferred fault domain or site is one of the two active -
active fault domains
• The Preferred fault domain or site could be changed
dynamically
• One of the active data sites is designated as “preferred”
fault domain
– Required to handle the “split-brain” scenario - link
failure between the active sites
– Determines which active site the witness joins
Preferred
major partition
Non-Preferred
vESX
i
Stretched Clusters and Read Locality
20
Read Locality
• A VM will be running in (at most) one site
• FTT=1 implies that there are two copies of the data, one
in each site
• Reads will be served from the copy of the data that
resides on the same site as where the VM runs
• If the VM moves to the other site, then reads will be
served from the (consistent) copy of the data in the new
site
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
Read Operation
FD1 FD2
L2 with multicast
vESX
i
CONFIDENTIAL
Stretched Clusters and Writes
21
Writes
• There is no locality for writes, availability over
performance!
• Writes must be acknowledged from both sites before we
ACK to the application
• A typical write operation does not include any
communication to the witness
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
Write Operation
FD1 FD2
L2 with multicast
vESX
i
CONFIDENTIAL
VMware Virtual SAN 6.1
Configuring and operating a Stretched Cluster
Configuring VMware Virtual SAN Stretched Cluster
• Simple configuration procedure
• Necessary L3 and L2 with multicasts network connectivity and configuration should be
completed before setting up stretched cluster
CONFIDENTIAL 23
Configure Fault
Domains
Select Witness
Host
Create Disk
Groups on
Witness
Validate health
of stretched
cluster
configuration
DEMO
24
Austin, TX Dallas, TX
witness
Plano, TX
5 ms latency over 10 gbps
L2 with Multicast
Dell Switches S6000-ON ToR
FX2 IO Modules - FN410S
Active Active
vSphere + Virtual SAN
Stretched Cluster
SSD SSD SSDSSD SSD SSD SSD SSD SSDSSD SSD SSD
Dell PowerEdge FX2 Dell PowerEdge FX2
vESXi
Dell Switches S6000-ON ToR
FX2 IO Modules - FN410S
CONFIDENTIAL
Configuring VMware Virtual SAN Stretched Cluster
• Health Check includes additional checks for stretched cluster:
– Witness host configuration
– Network configuration
– Host compatibility
– Fault domain configuration
CONFIDENTIAL 25
To configure stretched cluster go to > cluster > manager tab > Fault Domains > click on icon to start wizard
What about vSphere HA & DRS?
26
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
HA and DRS Behavior
• HA/DRS will not use the witness host as a target since the
witness is a standalone host in the VC and it will appear to
be an “incompatible” target
• HA failover
• If one site partitions away or fails, all Virtual SAN
objects will become inaccessible in that partition
• HA will failover the VMs running in that site to the
other active data site
• DRS will treat it as a normal cluster, migrations can
happen across sites
vESX
i
CONFIDENTIAL
vSphere HA Recommendations
• Make sure to set aside 50% of resources using
Admission Control!
– Admission control is not resource management
– Only guarantees power-on
• Enable “Isolation Response”
– “Power Off” recommended response
• Manually specify multiple isolation addresses
– One for each site using
“das.isolationaddressX”
– Disable the default gateway using
“das.useDefaultIsolationAddress=false”
• Make sure vSphere HA respects VM Host affinity
rules during failover!
27CONFIDENTIAL
vSphere DRS Recommendations
• Enable DRS, you want your VMs happy
• Remember Read Locality? Setup VM/Host
Affinity rules
– DRS will only migrate VMs to hosts which belong to
the VM/Host group
– Avoid “must rules” as they can bite you
– Use “should rules”, HA can respect these as of
vSphere 6.0!
– HA is smart and will go for “availability” over “rule
compliance”
28CONFIDENTIAL
Virtual SAN Stretched Clusters – Maintenance
CONFIDENTIAL 29
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Stretched Cluster supported policies
• On the witness node host/disk/disk group maintenance
mode is only allowed the “NoAction” mode
• In the UI the witness host is a standalone host and
by default only “NoAction” is supported for stand-
alone hosts. So no change in behavior
• Default mode for API has been modified to be
“NoAction”
• If disks on the witness node are decommissioned,
objects will lose compliance. CLOM crawler will fix
the compliance by rebuilding the witnesses
• For all other hosts in the cluster - “Enter maintenance
mode” is supported in all 3 modes
vESX
i
Virtual SAN 6.1
Stretched Cluster Failure Scenarios
Face Your Fears, Test Failover Scenarios!
• Data Site Partition
• Data Site Failure
• Witness Site Failure
• Witness network failure (1 site)
• Site Failure that hosts vCenter Server
• Host Isolation or Failure
CONFIDENTIAL 31
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Failure Scenarios – Network Partition Between Data Sites
CONFIDENTIAL 32
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Failure Scenario A
• What if there is a network partition between the two active
data sites aka “split brain scenario”?
• Witness always forms a cluster with the “preferred”
site in such a case and that is the partition that will
make progress
• This means that VMs in the “non-preferred” site will
lose access with storage
• If the HA network (most likely) is also isolated, then
VMs in the “non-preferred” site will be restarted in the
preferred site
• HA does not know what happened to the host in
the non-preferred site!
Preferred Non-Preferred
HA Restart
Failure Scenarios – Full Site Failure
CONFIDENTIAL 33
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
What if one active site fails?
• Since both active sites will have a copy of the data, the
second site can transparently take over
• Preferred or non-preferred makes no difference here
• Impacted VMs by full site failure will be restarted by
vSphere HA
• Site failure is detected if it misses the heartbeat for 5
consecutive times
• The heartbeat is sent every second
• Customers could continue creating VMs, etc. but they will
be out of compliant if FTT=1 (Force provisioning is added
by default)
• What would happen once the site comes back? This is
automatically detected which starts the re-sync of changed
data. Once re-sync is done then customer should use
DRS to distribute the VMs.
• Ideally you want all the nodes to be back online at the
same time
Preferred Non-Preferred
HA Restart
vCenter Server
Failure Scenarios – Witness Site Failure
CONFIDENTIAL 34
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms latency over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Details
• Witness failure is detected if it misses the heartbeat for 5
consecutive times
• The heartbeat is sent every second by both master and
backup
• If witness fails then there will be no disruption to IO
traffic for VMs
• VMs will continue running with no interruption since the
two main sites could create a quorum
• One can create a completely new witness and connect it
to the cluster
• What would happen once the witness comes back? We
will communicate all the meta-data (For all the objects in
the cluster) to the witness and cluster becomes healthy
VMware Virtual SAN 6.0
Interoperability
Virtual SAN Stretched Cluster with vSphere Replication and SRM
CONFIDENTIAL 36
• Live migrations and automated HA restarts between stretched cluster sites
• Replication between Virtual SAN datastores enables RPOs as low as 5 minute
• 5 minutes RPO is exclusively available to Virtual SAN 6.x
• Lower RPO’s are achievable due to Virtual SAN’s efficient vsanSparse snapshot mechanism
• SRM does not support standalone Virtual SAN with one vCenter Server
Any distance >5 min RPOsite a
vSphere + Virtual SAN
Stretched Cluster
< 5 ms latency over >10/20/40 gbps
Active Active
site b
L2 with Multicast
site x
vSphere + Virtual SAN
VR
DR
vCenter
vCenter
witness
appliance
SRM
SRM
DR orchestration for vCloud Air DR
CONFIDENTIAL 37
Single-click recovery of on-premises applications in the cloud
Roadmap
Overview
• Multi-VM recovery plans to define
application/site recovery procedures
• Easy to use workflows for DR testing,
DR failover and failback
• Graceful migration workflows to ensure
no data loss before planned downtime
• Drastically reduce RTOs when
recovering multiple applications or
entire site workloads
vSphere + Virtual SAN
Stretched cluster
VCD – vCloud Air
2-Node Remote Office Branch Office Solution
CONFIDENTIAL 38
Centralized Data Center
CentrallymanagedbyonevCenterServer
38
ROBO1
HDDSSD HDDSSD
vSphere + Virtual
SAN
HDDSSD HDDSSD
vSphere + Virtual
SAN
HDDSSD HDDSSD
vSphere + Virtual
SAN
witness
witness
witness
vESXi
appliance
vESXi
appliance
vESXi
appliance
ROBO2
ROBO3
vCenter Server
Overview
• Extension of Virtual SAN Stretched Cluster solution
• Each of the node will be in a Fault Domain (FD)
• One witness per Virtual SAN cluster
• 500ms Latency tolerated!
• Witness node is an ESXi appliance (VM)
• All sites managed centrally by one vCenter
• Patching and software upgrades performed
centrally through vCenter
• If there are N ROBOs then there will be
N witness VMs
39
THANK YOU
CONFIDENTIAL
Building a Stretched Cluster with Virtual SAN
Rawlinson Rivera, VMware, Inc
Duncan Epping, VMware, Inc
STO5333
#STO5333

More Related Content

What's hot

Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
openstackindia
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentation
aleyeldean
 
Dell VMware Virtual SAN Ready Nodes
Dell VMware Virtual SAN Ready NodesDell VMware Virtual SAN Ready Nodes
Dell VMware Virtual SAN Ready Nodes
Andrew McDaniel
 

What's hot (20)

VMware Tanzu Introduction
VMware Tanzu IntroductionVMware Tanzu Introduction
VMware Tanzu Introduction
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture components
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875
 
Vce vxrail-customer-presentation new
Vce vxrail-customer-presentation newVce vxrail-customer-presentation new
Vce vxrail-customer-presentation new
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
Nsx security deep dive
Nsx security deep diveNsx security deep dive
Nsx security deep dive
 
Presentation v mware virtual san 6.0
Presentation   v mware virtual san 6.0Presentation   v mware virtual san 6.0
Presentation v mware virtual san 6.0
 
VMware vSphere
VMware vSphereVMware vSphere
VMware vSphere
 
VMware cloud on AWS
VMware cloud on AWSVMware cloud on AWS
VMware cloud on AWS
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentation
 
Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail Hyper-Converged Infrastructure Vx Rail
Hyper-Converged Infrastructure Vx Rail
 
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
VxRail Appliance - Modernize your infrastructure and accelerate IT transforma...
 
OpenStack vs VMware vCloud
OpenStack vs VMware vCloudOpenStack vs VMware vCloud
OpenStack vs VMware vCloud
 
VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design
 
Dell VMware Virtual SAN Ready Nodes
Dell VMware Virtual SAN Ready NodesDell VMware Virtual SAN Ready Nodes
Dell VMware Virtual SAN Ready Nodes
 
Hci solution with VxRail
Hci solution with VxRailHci solution with VxRail
Hci solution with VxRail
 
vSphere7 with Tanzu
vSphere7 with Tanzu vSphere7 with Tanzu
vSphere7 with Tanzu
 
VMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes Connect
 

Viewers also liked

SRM versus Stretched Clusters: Choosing the Right Solution
SRM versus Stretched Clusters: Choosing the Right SolutionSRM versus Stretched Clusters: Choosing the Right Solution
SRM versus Stretched Clusters: Choosing the Right Solution
Scott Lowe
 
Partner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealizePartner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealize
Erik Bussink
 
VSAN-VMWorld2015-Rev08
VSAN-VMWorld2015-Rev08VSAN-VMWorld2015-Rev08
VSAN-VMWorld2015-Rev08
Nelson Fonseca
 

Viewers also liked (15)

VMworld 2016 - INF8036 - enforcing a vSphere cluster design with powercli aut...
VMworld 2016 - INF8036 - enforcing a vSphere cluster design with powercli aut...VMworld 2016 - INF8036 - enforcing a vSphere cluster design with powercli aut...
VMworld 2016 - INF8036 - enforcing a vSphere cluster design with powercli aut...
 
VMworld - sto7650 -Software defined storage @VMmware primer
VMworld - sto7650 -Software defined storage  @VMmware primerVMworld - sto7650 -Software defined storage  @VMmware primer
VMworld - sto7650 -Software defined storage @VMmware primer
 
What is coming for VMware vSphere?
What is coming for VMware vSphere?What is coming for VMware vSphere?
What is coming for VMware vSphere?
 
VMware Virtual SAN Presentation
VMware Virtual SAN PresentationVMware Virtual SAN Presentation
VMware Virtual SAN Presentation
 
SRM versus Stretched Clusters: Choosing the Right Solution
SRM versus Stretched Clusters: Choosing the Right SolutionSRM versus Stretched Clusters: Choosing the Right Solution
SRM versus Stretched Clusters: Choosing the Right Solution
 
VMware PSO credits FAQ for customers
VMware PSO credits FAQ for customersVMware PSO credits FAQ for customers
VMware PSO credits FAQ for customers
 
VMworld Europe 2014: Virtual SAN Best Practices and Use Cases
VMworld Europe 2014: Virtual SAN Best Practices and Use CasesVMworld Europe 2014: Virtual SAN Best Practices and Use Cases
VMworld Europe 2014: Virtual SAN Best Practices and Use Cases
 
The Unofficial VCAP / VCP VMware Study Guide
The Unofficial VCAP / VCP VMware Study GuideThe Unofficial VCAP / VCP VMware Study Guide
The Unofficial VCAP / VCP VMware Study Guide
 
Partner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealizePartner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealize
 
STO7534 VSAN Day 2 Operations (VMworld 2016)
STO7534 VSAN Day 2 Operations (VMworld 2016)STO7534 VSAN Day 2 Operations (VMworld 2016)
STO7534 VSAN Day 2 Operations (VMworld 2016)
 
VSAN-VMWorld2015-Rev08
VSAN-VMWorld2015-Rev08VSAN-VMWorld2015-Rev08
VSAN-VMWorld2015-Rev08
 
STO7535 Virtual SAN Proof of Concept - VMworld 2016
STO7535 Virtual SAN Proof of Concept - VMworld 2016STO7535 Virtual SAN Proof of Concept - VMworld 2016
STO7535 Virtual SAN Proof of Concept - VMworld 2016
 
Salt Cloud vmware-orchestration
Salt Cloud vmware-orchestrationSalt Cloud vmware-orchestration
Salt Cloud vmware-orchestration
 
VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
 

Similar to Building a Stretched Cluster using Virtual SAN 6.1

Thr30092 building a resilient iaa s architecture
Thr30092  building a resilient iaa s architectureThr30092  building a resilient iaa s architecture
Thr30092 building a resilient iaa s architecture
Abou CONDE
 
M02+-+SDDC+Features+and+Operations.ppsx
M02+-+SDDC+Features+and+Operations.ppsxM02+-+SDDC+Features+and+Operations.ppsx
M02+-+SDDC+Features+and+Operations.ppsx
RezaRestian2
 
NET8935_Small_DC_Shahzad_Ali
NET8935_Small_DC_Shahzad_AliNET8935_Small_DC_Shahzad_Ali
NET8935_Small_DC_Shahzad_Ali
shezy22
 
Design decision nfs-versus_fc_storage v_0.3
Design decision nfs-versus_fc_storage v_0.3Design decision nfs-versus_fc_storage v_0.3
Design decision nfs-versus_fc_storage v_0.3
David Pasek
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
howie YU
 

Similar to Building a Stretched Cluster using Virtual SAN 6.1 (20)

Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015
 
VMworld 2013: vSphere Distributed Switch – Design and Best Practices
VMworld 2013: vSphere Distributed Switch – Design and Best Practices VMworld 2013: vSphere Distributed Switch – Design and Best Practices
VMworld 2013: vSphere Distributed Switch – Design and Best Practices
 
Thr30092 building a resilient iaa s architecture
Thr30092  building a resilient iaa s architectureThr30092  building a resilient iaa s architecture
Thr30092 building a resilient iaa s architecture
 
Whats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageWhats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and Storage
 
M02+-+SDDC+Features+and+Operations.ppsx
M02+-+SDDC+Features+and+Operations.ppsxM02+-+SDDC+Features+and+Operations.ppsx
M02+-+SDDC+Features+and+Operations.ppsx
 
VMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes EverythingVMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes Everything
 
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
 
Hyper-V for dummies for VMware smarties
Hyper-V for dummies for VMware smartiesHyper-V for dummies for VMware smarties
Hyper-V for dummies for VMware smarties
 
Microsoft Server Virtualization and Private Cloud
Microsoft Server Virtualization and Private CloudMicrosoft Server Virtualization and Private Cloud
Microsoft Server Virtualization and Private Cloud
 
Server Virtualization using Hyper-V
Server Virtualization using Hyper-VServer Virtualization using Hyper-V
Server Virtualization using Hyper-V
 
Virtual Infrastructure Disaster Recovery
Virtual Infrastructure Disaster RecoveryVirtual Infrastructure Disaster Recovery
Virtual Infrastructure Disaster Recovery
 
VMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphereVMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphere
 
NET8935_Small_DC_Shahzad_Ali
NET8935_Small_DC_Shahzad_AliNET8935_Small_DC_Shahzad_Ali
NET8935_Small_DC_Shahzad_Ali
 
Presentation citrix cloud platform for infrastructure as a service
Presentation   citrix cloud platform for infrastructure as a servicePresentation   citrix cloud platform for infrastructure as a service
Presentation citrix cloud platform for infrastructure as a service
 
Design decision nfs-versus_fc_storage v_0.3
Design decision nfs-versus_fc_storage v_0.3Design decision nfs-versus_fc_storage v_0.3
Design decision nfs-versus_fc_storage v_0.3
 
VMworld 2013: Virtualized Network Services Model with VMware NSX
VMworld 2013: Virtualized Network Services Model with VMware NSX VMworld 2013: Virtualized Network Services Model with VMware NSX
VMworld 2013: Virtualized Network Services Model with VMware NSX
 
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
 
The dark side of stretched cluster
The dark side of stretched clusterThe dark side of stretched cluster
The dark side of stretched cluster
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
 
Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Building a Stretched Cluster using Virtual SAN 6.1

  • 1. Building a Stretched Cluster with Virtual SAN Rawlinson Rivera, VMware, Inc Duncan Epping, VMware, Inc STO5333 #STO5333
  • 2. • This presentation may contain product features that are currently under development. • This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. • Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Technical feasibility and market demand will affect final delivery. • Pricing and packaging for any new technologies or features discussed or presented have not been determined. Disclaimer CONFIDENTIAL 2
  • 3. Agenda CONFIDENTIAL 3 1 Introduction 2 Requirements and Architectural details 3 Configuring and operating a VSAN Stretched Cluster 4 Failure Scenarios 5 Interoperability
  • 4. VMware Virtual SAN 6.1 Introduction to Stretched Clustering
  • 5. Typical Use Cases For Virtual SAN Stretched Clusters • Planned maintenance of one site without any service downtime • Transparent to app owners and end users • Avoid lengthy approval processes • Ability to migrate applications back after maintenance is complete Planned Maintenance • Automated initiation of VM restart or recovery • Very low RTO for majority of unplanned failures • Allows users to focus on app health after recovery, not how to recover VMs Automated Recovery • Prevent service outages before an impending disaster (e.g. hurricane, rising flood levels) • Avoid downtime, not recover from it • Zero data loss possible if you have the time Disaster Avoidance CONFIDENTIAL 5
  • 6. Virtual SAN Stretch Cluster CONFIDENTIAL 6 • Increases Enterprise availability and data protection • Based on an Active – Active architecture • Supported on both Hybrid and All-Flash architectures • Enables synchronous replication of data between sites Site A Site B vSphere + Virtual SAN Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD
  • 7. Virtual SAN Stretch Cluster CONFIDENTIAL 7 • Site-level protection with zero data loss and near-instantaneous recovery • Virtual SAN Stretched Cluster can be scaled up to 15 nodes per-site • Beneficial solution for disaster avoidance and planned maintenance Site A Site B vSphere + Virtual SAN Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD
  • 8. Virtual SAN Stretched Cluster Virtual SAN Clusters: • Required a minimum of 3 fault domains to allow tolerating a single failure • Virtual Machine objects remain accessible after a single fault domain failure CONFIDENTIAL 8 Fault Domains Fault Domain A Fault Domain B Virtual SAN Datastore Fault Domain C Fault Domain A Fault Domain B Fault Domain C Data WitnessData
  • 9. Virtual SAN Stretched Cluster Stretched Clusters: • Provides similar availability with 2 active – active fault domains plus a witness only fault domain – Light-weight witness host needed only for quorum – Virtual SAN 6.1 allows a single witness host in a third fault domain • Virtual Machine disk objects (VMDKs) remain accessible after one data fault domain fails CONFIDENTIAL 9 Fault Domains Fault Domain A Fault Domain B Virtual SAN Datastore Fault Domain C Fault Domain A Fault Domain B Fault Domain C Data WitnessData • Virtual SAN increases its availability capabilities to: – Rack failures – Network failures – Hardware failures – Site failures
  • 10. Virtual SAN Stretch Cluster CONFIDENTIAL 10 • Virtual SAN cluster is formed across the 3 fault domains • Witness fault domain is utilized for witness purposes ONLY, not running VMs! • Availability Policy supported (FTT=1) • Automated failover in the event of site failure Site A Site C vSphere + Virtual SAN Stretched ClusterHDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD Fault Domain A Fault Domain CFault Domain B HDD SSD active activewitness Site B
  • 11. Virtual SAN Stretched Cluster Requirements and Architectural Details
  • 12. Requirements • Network – Virtual SAN storage networking – Virtual SAN witness networking – vSphere and virtual machine networking • Storage – Virtual machine storage – Witness appliance storage • Compute – Virtual SAN witness – vSphere HA – vSphere DRS CONFIDENTIAL 12
  • 13. Virtual SAN Stretched Cluster Networking Requirements CONFIDENTIAL 13 HDDSSD vSphere + Virtual SAN FD1 200 ms latency over 100 mbps over L3 no multicast < 5 ms latency over >10/20/40 Gbps over L2 with multicast • Network Requirements between data fault domains/sites – 10 Gbps connectivity or greater – < 5 millisecond latency RTT – Layer 2 or Layer 3 network connectivity with multicast • Network Requirements to witness fault domain – 100 Mbps connectivity – 200 milliseconds latency RTT – Layer 3 network connectivity without multicast • Network bandwidth requirements calculated based on write operations between fault domains – Kbps= (Nodes * Writes * 125) – Deployment of 5+5+1 and ~300 VM would be ~4Gbps FD3FD2 HDDSSDHDDSSD 200 ms latency over 100 mbps over L3 no multicast
  • 14. Virtual SAN Witness Appliance Overview and Storage Requirements CONFIDENTIAL 14 Witness overview and requirements • Witness appliance: – ONLY supported with Stretched Cluster * – ONLY stores meta-data NOT customer data – is not able to host any virtual machines – can be re-created in event of failure • Appliance requirements: – at least three VMDK’s – Boot disks for ESXi requires 20GB – Capacity tier requires 16MB per witness component – Caching tier is 10% of capacity tier – Both tiers on witness could be on MDs • The amount of storage on the witness is related to number of components on the witness Witness Appliance vESXi
  • 15. Virtual SAN Witness Appliance Sizing Requirements CONFIDENTIAL 15 Resource Requirements • Large scale (15+15+1) – Max 3000 VMs and 18000 components on the witness – Memory: 32 GB – CPU: 2vCPU – Storage: 350 GB for capacity and 10GB for caching tier • Medium (4+4+1) – Max 800 VMs & ~5000 Components on the witness – Memory: 16 GB – CPU: 2vCPU • Storage: 50 GB for capacity and 5GB for caching tier – Small (1+1+1) – Max 200 VMs & 1200 Component on the witness – Memory: 16 GB – CPU: 2vCPU – Storage: 20 GB for capacity and 3GB for caching tier OR Data Center vCloud Air vESXi
  • 16. Virtual SAN Witness Appliance Network Requirements CONFIDENTIAL 16 Witness network requirements • Network communication – between the witness and main sites is L3 (IP based) and no multicast requirement! – Witness node was optimized to receive minimal metadata traffic – Read and write operations do not require any communication to the witness – Traffic is mostly limited to metadata updates – Must not be route communication through the witness site – Heartbeat between the witness and other fault domains happens once a second. – After 5 consecutive failures the communication is declared failed HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast vESXi
  • 17. Virtual SAN Stretched Clusters – Supported Deployment Scenarios CONFIDENTIAL 17 Complete Layer 3 Deployment Stretched L2 with multicast HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 Layer 3 Network Layer 3 Network Traditional Layer 3 and Layer 2 Deployment routerrouter router Layer 3 Network router router Layer 3 Network 3rd party solution to manage VM networks required vESXi vESXi
  • 18. Virtual SAN Stretched Cluster – Supported Storage Policies CONFIDENTIAL 18 • Maximum supported “FailuresToTolerate” is 1 due to the support of only 3 fault domains – “FailuresToTolerate=1” object will be implicitly “forceProvisioned” when only two of the three sites are available – Compliance will be fixed for such objects once the third site becomes available Fault Domain A Active Fault Domain C Active vSphere + Virtual SAN Stretched Cluster HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD Witness Fault Domain B HDD SSD
  • 19. Virtual SAN Stretched Clusters – Preferred, Non-preferred Sites CONFIDENTIAL 19 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast Preferred • Preferred fault domain or site is one of the two active - active fault domains • The Preferred fault domain or site could be changed dynamically • One of the active data sites is designated as “preferred” fault domain – Required to handle the “split-brain” scenario - link failure between the active sites – Determines which active site the witness joins Preferred major partition Non-Preferred vESX i
  • 20. Stretched Clusters and Read Locality 20 Read Locality • A VM will be running in (at most) one site • FTT=1 implies that there are two copies of the data, one in each site • Reads will be served from the copy of the data that resides on the same site as where the VM runs • If the VM moves to the other site, then reads will be served from the (consistent) copy of the data in the new site HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 Read Operation FD1 FD2 L2 with multicast vESX i CONFIDENTIAL
  • 21. Stretched Clusters and Writes 21 Writes • There is no locality for writes, availability over performance! • Writes must be acknowledged from both sites before we ACK to the application • A typical write operation does not include any communication to the witness HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 Write Operation FD1 FD2 L2 with multicast vESX i CONFIDENTIAL
  • 22. VMware Virtual SAN 6.1 Configuring and operating a Stretched Cluster
  • 23. Configuring VMware Virtual SAN Stretched Cluster • Simple configuration procedure • Necessary L3 and L2 with multicasts network connectivity and configuration should be completed before setting up stretched cluster CONFIDENTIAL 23 Configure Fault Domains Select Witness Host Create Disk Groups on Witness Validate health of stretched cluster configuration
  • 24. DEMO 24 Austin, TX Dallas, TX witness Plano, TX 5 ms latency over 10 gbps L2 with Multicast Dell Switches S6000-ON ToR FX2 IO Modules - FN410S Active Active vSphere + Virtual SAN Stretched Cluster SSD SSD SSDSSD SSD SSD SSD SSD SSDSSD SSD SSD Dell PowerEdge FX2 Dell PowerEdge FX2 vESXi Dell Switches S6000-ON ToR FX2 IO Modules - FN410S CONFIDENTIAL
  • 25. Configuring VMware Virtual SAN Stretched Cluster • Health Check includes additional checks for stretched cluster: – Witness host configuration – Network configuration – Host compatibility – Fault domain configuration CONFIDENTIAL 25 To configure stretched cluster go to > cluster > manager tab > Fault Domains > click on icon to start wizard
  • 26. What about vSphere HA & DRS? 26 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast HA and DRS Behavior • HA/DRS will not use the witness host as a target since the witness is a standalone host in the VC and it will appear to be an “incompatible” target • HA failover • If one site partitions away or fails, all Virtual SAN objects will become inaccessible in that partition • HA will failover the VMs running in that site to the other active data site • DRS will treat it as a normal cluster, migrations can happen across sites vESX i CONFIDENTIAL
  • 27. vSphere HA Recommendations • Make sure to set aside 50% of resources using Admission Control! – Admission control is not resource management – Only guarantees power-on • Enable “Isolation Response” – “Power Off” recommended response • Manually specify multiple isolation addresses – One for each site using “das.isolationaddressX” – Disable the default gateway using “das.useDefaultIsolationAddress=false” • Make sure vSphere HA respects VM Host affinity rules during failover! 27CONFIDENTIAL
  • 28. vSphere DRS Recommendations • Enable DRS, you want your VMs happy • Remember Read Locality? Setup VM/Host Affinity rules – DRS will only migrate VMs to hosts which belong to the VM/Host group – Avoid “must rules” as they can bite you – Use “should rules”, HA can respect these as of vSphere 6.0! – HA is smart and will go for “availability” over “rule compliance” 28CONFIDENTIAL
  • 29. Virtual SAN Stretched Clusters – Maintenance CONFIDENTIAL 29 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast Stretched Cluster supported policies • On the witness node host/disk/disk group maintenance mode is only allowed the “NoAction” mode • In the UI the witness host is a standalone host and by default only “NoAction” is supported for stand- alone hosts. So no change in behavior • Default mode for API has been modified to be “NoAction” • If disks on the witness node are decommissioned, objects will lose compliance. CLOM crawler will fix the compliance by rebuilding the witnesses • For all other hosts in the cluster - “Enter maintenance mode” is supported in all 3 modes vESX i
  • 30. Virtual SAN 6.1 Stretched Cluster Failure Scenarios
  • 31. Face Your Fears, Test Failover Scenarios! • Data Site Partition • Data Site Failure • Witness Site Failure • Witness network failure (1 site) • Site Failure that hosts vCenter Server • Host Isolation or Failure CONFIDENTIAL 31 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast
  • 32. Failure Scenarios – Network Partition Between Data Sites CONFIDENTIAL 32 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast Failure Scenario A • What if there is a network partition between the two active data sites aka “split brain scenario”? • Witness always forms a cluster with the “preferred” site in such a case and that is the partition that will make progress • This means that VMs in the “non-preferred” site will lose access with storage • If the HA network (most likely) is also isolated, then VMs in the “non-preferred” site will be restarted in the preferred site • HA does not know what happened to the host in the non-preferred site! Preferred Non-Preferred HA Restart
  • 33. Failure Scenarios – Full Site Failure CONFIDENTIAL 33 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast What if one active site fails? • Since both active sites will have a copy of the data, the second site can transparently take over • Preferred or non-preferred makes no difference here • Impacted VMs by full site failure will be restarted by vSphere HA • Site failure is detected if it misses the heartbeat for 5 consecutive times • The heartbeat is sent every second • Customers could continue creating VMs, etc. but they will be out of compliant if FTT=1 (Force provisioning is added by default) • What would happen once the site comes back? This is automatically detected which starts the re-sync of changed data. Once re-sync is done then customer should use DRS to distribute the VMs. • Ideally you want all the nodes to be back online at the same time Preferred Non-Preferred HA Restart vCenter Server
  • 34. Failure Scenarios – Witness Site Failure CONFIDENTIAL 34 HDDSSD HDDSSD vSphere + Virtual SAN Witness Appliance < 5 ms latency over >10/20/40 Gbps FD3 FD1 FD2 L2 with multicast Details • Witness failure is detected if it misses the heartbeat for 5 consecutive times • The heartbeat is sent every second by both master and backup • If witness fails then there will be no disruption to IO traffic for VMs • VMs will continue running with no interruption since the two main sites could create a quorum • One can create a completely new witness and connect it to the cluster • What would happen once the witness comes back? We will communicate all the meta-data (For all the objects in the cluster) to the witness and cluster becomes healthy
  • 35. VMware Virtual SAN 6.0 Interoperability
  • 36. Virtual SAN Stretched Cluster with vSphere Replication and SRM CONFIDENTIAL 36 • Live migrations and automated HA restarts between stretched cluster sites • Replication between Virtual SAN datastores enables RPOs as low as 5 minute • 5 minutes RPO is exclusively available to Virtual SAN 6.x • Lower RPO’s are achievable due to Virtual SAN’s efficient vsanSparse snapshot mechanism • SRM does not support standalone Virtual SAN with one vCenter Server Any distance >5 min RPOsite a vSphere + Virtual SAN Stretched Cluster < 5 ms latency over >10/20/40 gbps Active Active site b L2 with Multicast site x vSphere + Virtual SAN VR DR vCenter vCenter witness appliance SRM SRM
  • 37. DR orchestration for vCloud Air DR CONFIDENTIAL 37 Single-click recovery of on-premises applications in the cloud Roadmap Overview • Multi-VM recovery plans to define application/site recovery procedures • Easy to use workflows for DR testing, DR failover and failback • Graceful migration workflows to ensure no data loss before planned downtime • Drastically reduce RTOs when recovering multiple applications or entire site workloads vSphere + Virtual SAN Stretched cluster VCD – vCloud Air
  • 38. 2-Node Remote Office Branch Office Solution CONFIDENTIAL 38 Centralized Data Center CentrallymanagedbyonevCenterServer 38 ROBO1 HDDSSD HDDSSD vSphere + Virtual SAN HDDSSD HDDSSD vSphere + Virtual SAN HDDSSD HDDSSD vSphere + Virtual SAN witness witness witness vESXi appliance vESXi appliance vESXi appliance ROBO2 ROBO3 vCenter Server Overview • Extension of Virtual SAN Stretched Cluster solution • Each of the node will be in a Fault Domain (FD) • One witness per Virtual SAN cluster • 500ms Latency tolerated! • Witness node is an ESXi appliance (VM) • All sites managed centrally by one vCenter • Patching and software upgrades performed centrally through vCenter • If there are N ROBOs then there will be N witness VMs
  • 40.
  • 41.
  • 42. Building a Stretched Cluster with Virtual SAN Rawlinson Rivera, VMware, Inc Duncan Epping, VMware, Inc STO5333 #STO5333

Editor's Notes

  1. 5
  2. Communication between the main sites and witness is unicast( FD1 and FD2 share a single L2 domain. FD3 is only reachable via L3 from FD1 & FD2) Architecture is based on Fault Domains.
  3. Site-level protection with zero data loss and near-instantaneous recovery Production apps at both sites with seamless mobility across sites Zero downtime for planned events Typically limited to a Metro distance
  4. Site-level protection with zero data loss and near-instantaneous recovery Production apps at both sites with seamless mobility across sites Zero downtime for planned events Typically limited to a Metro distance
  5. Site-level protection with zero data loss and near-instantaneous recovery Production apps at both sites with seamless mobility across sites Zero downtime for planned events Typically limited to a Metro distance ( t could be another Data Center, vCloud Air, or Colo).
  6. Max network latency of 5 millisecond RTT and enough bandwidth for the workloads Support for sites up to 100km apart as long as network requirements are met <= 5ms latency over 10/20/40gbps to data fault domains (L2 with multicast) Network Bandwidth requirement for the write operations between the main two sites in Kbps= N (number of nodes on one site) * W (Amount of 4K IOPS per Node) * 125. Minimum of 1 Gbps each way For a 5+5+1 config on server medium and ~300 VM the network requirement is around total of 4 Gbps (2Gbps each way) Layer 2 network communication is required
  7. Max number of components per Witness capacity disk is ~21,000 Maximum number of components on witness is 45,000 All the witness node disks could be thin provisioned Number of components on the Witness reflect the number of objects on the VMs. There will be always at most one witness component per object Each VM requires one VMDK, one namespace and one swap file. This is a minimum of 3 objects per VM. Each snapshot adds one object per VMDK. Stripes do not add to the number of objects for the VM
  8. Q: When should I disable Real Locality A: If latency is less than 1m.s and there is enough bandwidth between the sites. Please notice that disabling read locality means that 50% of the reads will go to the second site, so that should be a consideration for sizing of the network bandwidth. Please refer to the sizing of the network bandwidth between the main two site for more details. We will not expose this to the customers and will be available mostly for specific use cases. Q: What should be the bandwidth between the two main sites/FDs when Real-Locality is disabled A: In this case 50% of the read operations will go to the second site all the time. The sizing should start from the baseline when the RL (Read locality) is enabled and then add the read workload to come up with total bandwidth
  9. At steady state there is barely any communication between the main sites and the witness. Read and write operations do not require any communication to the witness witness node was optimized to receive minimal metadata traffic compared to a regular Virtual SAN cluster Traffic is mostly limited to create, delete, reconfigure, change policy, failover, and failback  metadata There is a heartbeat between the witness and the main sites which typically happens once a second.
  10. All other Virtual SAN policy support will remain unchanged
  11. Preferred fault domain or site is used in the case of a network partition, so that the storage on that site would be active, while the storage on the non-preferred site would be down.
  12. It will take a bit of time for the read cache to be warmed, but such migrations in stretch clusters are not common Use vSphere DRS VM/Host rules to prevent this during normal operations If VM moves (either failover by HA or vMotion or power off-on cycle) Q: When should I disable Real Locality A: If latency is less than 1m.s and there is enough bandwidth between the sites. Please notice that disabling read locality means that 50% of the reads will go to the second site, so that should be a consideration for sizing of the network bandwidth. Please refer to the sizing of the network bandwidth between the main two site for more details. We will not expose this to the customers and will be available mostly for specific use cases. Q: What should be the bandwidth between the two main sites/FDs when Real-Locality is disabled A: In this case 50% of the read operations will go to the second site all the time. The sizing should start from the baseline when the RL (Read locality) is enabled and then add the read workload to come up with total bandwidth
  13. FX2 configured via the FN410S (IO Aggregation module) Plug and Play Configuration Automatically setup for VLAN and multicast
  14. Q: When should I disable Real Locality A: If latency is less than 1m.s and there is enough bandwidth between the sites. Please notice that disabling read locality means that 50% of the reads will go to the second site, so that should be a consideration for sizing of the network bandwidth. Please refer to the sizing of the network bandwidth between the main two site for more details. We will not expose this to the customers and will be available mostly for specific use cases. Q: What should be the bandwidth between the two main sites/FDs when Real-Locality is disabled A: In this case 50% of the read operations will go to the second site all the time. The sizing should start from the baseline when the RL (Read locality) is enabled and then add the read workload to come up with total bandwidth
  15. Q: When should I disable Real Locality A: If latency is less than 1m.s and there is enough bandwidth between the sites. Please notice that disabling read locality means that 50% of the reads will go to the second site, so that should be a consideration for sizing of the network bandwidth. Please refer to the sizing of the network bandwidth between the main two site for more details. We will not expose this to the customers and will be available mostly for specific use cases. Q: What should be the bandwidth between the two main sites/FDs when Real-Locality is disabled A: In this case 50% of the read operations will go to the second site all the time. The sizing should start from the baseline when the RL (Read locality) is enabled and then add the read workload to come up with total bandwidth
  16. Virtual SAN 6.0 is the second-generation of VMware’s hypervisor-converged storage for VMs. Virtual SAN 6.0 capabilities are centered on delivering high performance and increased scale without compromising simplicity and cost-effectiveness. Most notably, Virtual SAN 6.0 now allows to create an all-flash architecture in which the VSAN datastore is carved entirely out of flash devices – i.e. SSDs are used for both caching and data persistence In VSAN 6.0, performance has been improved with 2x more IOPS on the hybrid configuration and 4x more IOPS on All-Flash, making VSAN ideal for high performance applications which require consistent response times and low latencies Virtual SAN 6.0 also features twice the scale, with the ability to scale to 64 hosts per VSAN cluster In addition, VSAN 6.0 can scale to run up to 200 VMs per host representing a 50% increase over the previous version It also features a new file system called the VSAN file system which provides efficient snapshots and clones Rack-awareness provides the ability to tolerate rack failures for a VSAN cluster that spans multiple racks Support for hardware controller-based checksums helps detects data corruption issues while support for hardware-based encryption can encrypt ‘data-at-rest’ for workloads that require the added level of security
  17. vCloud Air Disaster Recovery introduced the ability for customers to protect and recover Virtual Machines running in their on-premise data center. This is based on vSphere Replication technology that was modified and enhanced for hybrid cloud usage. The next step in the cloud-based DR service is to offer a rich-set of automation capabilities that simplify the Disaster Recovery testing, failover and failback operations. In future, vCloud Air DR service should enable customers to create multi-VM recovery plans and classify their applications into priority tiers, specify interdependencies such that VMs are brought up in an orderly fashion as defined by application owners, and provide further extensibility of runbook procedures with custom scripts. Customers who embraced an on-premise DR strategy using VMware Site Recovery Manager are already benefiting from these capabilities. This roadmap item offers similar capabilities to vCloud Air DR customers.
  18. Assuming 10 VM in ROBO. Each VM has 5 object. Total number of components on the witness are: 50 objects. 50 * 4 MB is around 200 MB for storage. 2000 ROBO require 2000 * 8 GB of memory. Assuming nodes of 256MB this means: 2000*8/256 or around For very large ROBO deployments, a customer could deploy a VSAN cluster with 3-4 node in central DC to host all the witnesses