STO7535 Virtual SAN Proof of Concept - VMworld 2016

Conducting a Successful Virtual SAN 6.2
Proof of Concept
Paudie ORiordan, VMware, Inc
Cormac Hogan, VMware, Inc
STO7535
#STO7535

• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
CONFIDENTIAL 2

Agenda
1 Introduction to Session
2 Introduction to Virtual SAN
3 Tools to conduct a successful Virtual SAN proof of concept (POC)
4 POC validation scenarios
5 Data Services Considerations
6 Measuring Performance
CONFIDENTIAL 4

This session…
• Virtual SAN has been available since March 2014, almost 2.5 years
• To date, we have now almost 5,000 VSAN customers.
• VMware recognises that conducting a Virtual SAN proof of concept can be challenging
• Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting
Virtual SAN have become available
• In this session, the tools available to vSphere and Virtual SAN administrators will be discussed,
and how they can help deliver a Virtual SAN proof of concept
5CONFIDENTIAL

Introduction to VMware Virtual SAN
• Storage scale out architecture built into
the hypervisor
• Aggregates locally attached storage from each
ESXi host in a cluster
• Dynamic capacity and performance scalability
• Flash optimized storage solution
– Fully integrated with vSphere and interoperable:
• vMotion, DRS, HA, VDP, VR …
• VM-centric data operations
• Many new data services
CONFIDENTIAL 6
+ + + +
+ + +
…
+
Datastore
Virtual SAN

What I Need to Be Successful
Tools to conduct a successful Virtual SAN POC

Before YOU BEGIN: Verify Your Components Against HCL
• VMware Virtual SAN Hardware
• Server, Controller, SSD, Disk on HCL
• Controller Firmware, Driver
• Disk Firmware,
• Enclosure Firmware
• SAS/SATA SSD Minimum Firmware is Critical
– Rule is minimum or higher
• NVMe Firmware
– HCL lists absolute version only
CONFIDENTIAL 8

Success Tool #1 : Health Plugin – Reactive Health Checks
• Introduced with Virtual SAN 6.0
• Incorporate in the vSphere Web Client
• Virtual SAN Health Check tool include:
– General Health
– Proactive tests
– Virtual SAN HCL health
– Physical disk health
9
• Especially useful when
injecting errors into cluster
and verifying that they
have been remediated
CONFIDENTIAL

Success Tool #1 : Health Plugin – Proactive Health Checks
• Proactive tools running
on Virtual SAN cluster
and pre-production tests
– VM Creation test
– Storage Performance
– Multicast performance
test
10CONFIDENTIAL

Success Tool #2 : Capacity Views
• Dedupe and
Compression Savings
• Group by Object Type
– Filesystem overhead
– Dedupe overhead
– Checksum overhead
– Virtual disks
– Swap
– Home namespace
11CONFIDENTIAL

Success Tool #3 : Performance Service
• Enable it once
• Integrated with vSphere
• Simplified metrics
– Backend (VSAN)
– Frontend (VM)
• Distributed Architecture
– No SPOF
• Historical data
• Status monitored by
health checks
12CONFIDENTIAL

Success Tool #4 : HCIbench
• Hyperconverged Infrastructure benchmark
• Based on Vdbench
• Designed to work on distributed architectures
like Virtual SAN
• UI Driven
• Free
• Provides results in both text format, and format
that can be viewed in VSAN Observer
• Now available from
https://labs.vmware.com/flings
13CONFIDENTIAL

Success Tool #5 : RVC/Virtual SAN Observer
• Native tools installed on Linux/Appliance and Windows versions of vCenter Server
• Used for Configuration and Status of the Virtual SAN Cluster
• For Performance and Activity monitoring on demand
– VM level
– Host level
– VMDK level
– HDD/SSD Level
• Any anomalies will show up with the metric in question shown in red
• Follow the I/O : VM -> VMDK -> Disk Group -> Disk -> Congestion
14CONFIDENTIAL

Success Tool #5 : RVC/Virtual SAN Observer (ctd.)
15
vsan.apply_license_to_cluster
vsan.enable_vsan_on_cluster
vsan.disable_vsan_on_cluster
vsan.clear_disks_cache
vsan.cluster_change_autoclaim
vsan.cluster_set_default_policy
vsan.enter_maintenance_mode
vsan.fix_renamed_vms
vsan.object_reconfigure
vsan.host_wipe_vsan_disks
vsan.recover_spbm
vsan.reapply_vsan_vmknic_config
Cluster
vsan.check_limits
vsan.check_state
vsan.cluster_info
vsan.cmmds_find
vsan.whatif_host_failures
vsan.resync_dashboard
Disk
vsan.disk_object_info
vsan.disks_info
vsan.disks_stats
Host
vsan.host_info
vsan.host_consume_disks
Networking
vsan.lldpnetmap
VM
vsan.vm_object_info
vsan.vm_perf_stats
vsan.vmdk_stats
vsan.obj_status_report
vsan.object_info
Troubleshooting
vsan.support_information
vsan.observer
Virtual SAN Operation Virtual SAN Information
Virtual SAN Monitoring
CONFIDENTIAL

Validation Scenarios
Expected outcomes from POC activities

PoC Validation
• What are the most important test validation?
1. Successful VSAN configuration
2. Successful VM deployments on VSAN datastore
3. VM Availability in the event of failures (host, storage device, network)
4. VSAN Serviceability (maintenance of hosts, disk groups, disks)
5. VM Performance meets expectations
6. VSAN Data Services (Dedupe, Compression, RAID-5/6, Checksum) working
as expected
17CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Checklist
• Correct vSphere versions
• Appropriate licenses
– especially if PoC is expected to take a long time (> 60 days)
• Correctly Configured Network
– VSAN requires multicast, so prep the network team
• Minimum of three servers
– Or 2 servers plus a witness appliance if doing Remote Office/Branch Office (ROBO)
18
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Checklist (ctd.)
• Minimum of three servers contributing
storage:
• At least one storage controller – you’ve checked
the HCL, and drivers and firmware are valid, right?
• At least one flash device (SSD, PCIe) for cache –
check the HCL
• At least one magnetic disk (hybrid) or flash device
(all-flash) for capacity – check the HCL
• Or consider VSAN Ready Nodes as an option …
15
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Device Claiming
• Devices not visible
– Some RAID controllers won’t present individual disks without RAID configuration
– May need RAID-0 configuration set on storage devices via controller
• Devices not being claimed
– Some controllers allow devices to be shared; so devices get presented as “non-local”
– VSAN will only claim devices that are local
• SSD showing up as HDD
– Placing devices in RAID-0 will do this
• All-Flash using wrong devices for cache/capacity
– Set VSAN to “Manual mode” when setting up all-flash
– Gives control over which devices are used for cache and which devices are use for capacity
20CONFIDENTIAL

Case #1 – Successfully VSAN Deployment – Overall health
21
Run health
checks after
every test!
Clear Alarms!
Use it to verify a
problem that was
previously
introduced is
now fixed!
Check the Virtual SAN Health Check regularly
CONFIDENTIAL

Case #2 : Successful VM Deployment on VSAN
22
Use the Health Check – Proactive Tests to do initial VM deployment check
Part of the Proactive Tests. This
will verify if virtual machines can
be created on VSAN cluster
CONFIDENTIAL

Case #2 : Successful VM Deployment on VSAN
23
Component host location
I created a new VM, but where/how is the VM is stored
CONFIDENTIAL

Case #3 : VM Availability in the Event of Failures
• Various failures may be introduced as part of a typical POC
– Host failure
– Flash device / Magnetic Disk failure – Cache/Capacity device failures
– Network failure
• Objective: ensure that the VM continues to be available in the event
of a failure. VM maybe restarted on another node in the cluster.
• vSphere HA is fully integrated with Virtual SAN so that virtual machines
on the failed host are restarted on other hosts elsewhere on the cluster
24CONFIDENTIAL

Case #3.1 : Host Failures
• How many hosts do I really need?
• A minimum of 3 hosts is needed to support VSAN.
• What about rebuilding after a failure or maintenance mode operations?
• If you want virtual machines to remain highly available on VSAN during these scenarios,
consider configuring for additional capacity i.e. minimum 4 nodes.
25CONFIDENTIAL

Case #3.2 : Storage Failures
• The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors
for the purpose of testing.
– A real disk failure results in immediate rebuild activity initiated by VSAN
26
Eject/Offline/Unplug: Absent
Wait 60 minutes before
remediation
Failure: Degraded
Immediate remediation
CONFIDENTIAL

Case #3.2 : Storage Failures (ctd.)
• Additional considerations when dedupe/compression are enabled on VSAN
– Deduplication and compression hash tables/metadata are spread across all disks in a disk group
– A single device failure in the disk group will render the whole of the disk group unavailable
– All data in disk group will be rebuilt elsewhere in the cluster (if resources allow)
27
Rebuild Rebuild Rebuild
CONFIDENTIAL

Case #3.3 : Network Failure
28
Part of the Proactive Tests. This
will verify if multicast
performance is acceptable can
for VSAN cluster
Multicast configuration is the most common issue
Start simple
If you want feature like
LACP, don’t implement
initially. Turn off QoS/Flow
Control, then build it
afterwards
CONFIDENTIAL

Case #3.4 : Validating Rebuild Activity After Failure
• Virtual SAN might need to move data around in the background: change policy, host failure,
long term/permanent component loss, user triggered reconfig, maintenance mode, etc.
• UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync
29
Remember!
Test one
thing at a
time!
CONFIDENTIAL

Case #4 : VSAN Serviceability – Maintenance Mode
30
I want to update one of my ESXi host in a VSAN cluster, what do I do ?
VSAN provides multiple options
for maintenance mode
CONFIDENTIAL

Case #4 : VSAN Serviceability – Maintenance Mode
31
Ensure Accessibility Full Data Migration No data Migration
Lost of VM compliance Full VM Data compliance No VM availability ensured
Short time maintenance More than one hour of
Maintenance
Short time maintenance
Short Storage preparation Long storage preparation No Impact
Limited Free Storage space
required
Free Storage space requirements
on the other nodes
No Impact
Full
migration
unvailable in
3 node
clusters!
CONFIDENTIAL

Case #5 : Management – Disks Serviceability
32
Disk serviceability feature enables identification of to be replaced magnetic disks and flash based
CONFIDENTIAL

Case #5 : Management – Disk/Disk Group Evacuation
• Allows you to evacuate data from disk groups and individual disks before removing
a disk/disk group from a Virtual SAN host
• Allows Virtual SAN to ensure all workloads stay fully compliant with their policy!
– Supported in the UI, ESXCLI and RVC.
– Check box in the “Remove disk/disk group” UI screen.
33CONFIDENTIAL

PoC considerations for New
Data Services in VSAN 6.2

New Data Services in VSAN 6.2
• Erasure Coding – RAID-5/RAID-6 Support
• Deduplication / Compression
• Checksum
• IOPS limits / QoS
35
There are performance considerations associated with all of the above.
There are also some issues to be aware of!
CONFIDENTIAL

Capacity Overhead of the New Data Services
• Overheads are all calculated in advance
– Deduplication/Compression maintain hash tables
• Approx. 5% overhead
– Checksum Metadata is stored separately from data
• Approx. 1.2 % overhead
CONFIDENTIAL 36
Many customers are surprised by the amount of overhead when data services
are first enabled

Data Services File System Overheads – Don’t Panic
• Deduplication and Compression File System Overhead is 5% (approx.) of Total Virtual
SAN Capacity
• Checksum Overhead is approx. 1.2% of capacity
37CONFIDENTIAL

How to Measure Virtual
SAN Performance?

How to Test Performance…
• Distributed architecture => best performance when the pooled compute and storage resources
in the cluster are well utilized.
• This usually means a number of VMs each running the specified workload should be distributed
in the cluster and run in a consistent manner to deliver aggregated performance.
• This part of an evaluation can be complex and time-consuming
• Real application workloads are best, but …
– synthetic workloads (IOmeter) might be easier to set up
– simplistic workloads don’t really reflect what Virtual SAN can do
• Worth a read: Pro Tips For Storage Performance Testing
– http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/
39CONFIDENTIAL

Performance Testing Considerations (Primarily for Hybrid)
40
Is the test utilising the distributed storage resources of Virtual SAN?
• Multiple VMs across multiple hosts delivers better performance than one VM on one host.
Is the working set fully in cache, utilising flash performance?
• Read-cache misses will incur latency.
Is the workload cache friendly?
• Sustained sequential write workloads fill cache, which must then be destaged. Mixed
R/W workloads with repeat patterns are best.
Is the cache warmed if using VSAN hybrid?
• Initial results from starts of tests will not be reflective of overall performance.
Warning : Make sure dedupe scrubber is disabled. Causes performance issue on hybrid *
* KB 2146267
CONFIDENTIAL

Performance Test with HCIbench/vdbench
• VMs will be distributed equally across all hosts
• Select I/O size
• Select R/W ratio
• Select random/sequential
• Select duration of test
• Disks can be zeroed with “dd”*
• VMs will be removed (optionally) when test
completes
• Produces results per VM
– IOPS, Latency, Throughput, etc
• Produces results consumable by VSAN
Observer
41
* Avoid zeroing disks if deduplication enabled – will create hot-spot
CONFIDENTIAL

STO7535 Virtual SAN Proof of Concept - VMworld 2016

STO7535 Virtual SAN Proof of Concept - VMworld 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to STO7535 Virtual SAN Proof of Concept - VMworld 2016

Similar to STO7535 Virtual SAN Proof of Concept - VMworld 2016 (20)

Recently uploaded

Recently uploaded (20)

STO7535 Virtual SAN Proof of Concept - VMworld 2016

Editor's Notes