Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
STO7535 Virtual SAN Proof of Concept - VMworld 2016
1. Conducting a Successful Virtual SAN 6.2
Proof of Concept
Paudie ORiordan, VMware, Inc
Cormac Hogan, VMware, Inc
STO7535
#STO7535
2. • This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
CONFIDENTIAL 2
4. Agenda
1 Introduction to Session
2 Introduction to Virtual SAN
3 Tools to conduct a successful Virtual SAN proof of concept (POC)
4 POC validation scenarios
5 Data Services Considerations
6 Measuring Performance
CONFIDENTIAL 4
5. This session…
• Virtual SAN has been available since March 2014, almost 2.5 years
• To date, we have now almost 5,000 VSAN customers.
• VMware recognises that conducting a Virtual SAN proof of concept can be challenging
• Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting
Virtual SAN have become available
• In this session, the tools available to vSphere and Virtual SAN administrators will be discussed,
and how they can help deliver a Virtual SAN proof of concept
5CONFIDENTIAL
6. Introduction to VMware Virtual SAN
• Storage scale out architecture built into
the hypervisor
• Aggregates locally attached storage from each
ESXi host in a cluster
• Dynamic capacity and performance scalability
• Flash optimized storage solution
– Fully integrated with vSphere and interoperable:
• vMotion, DRS, HA, VDP, VR …
• VM-centric data operations
• Many new data services
CONFIDENTIAL 6
+ + + +
+ + +
…
+
Datastore
Virtual SAN
7. What I Need to Be Successful
Tools to conduct a successful Virtual SAN POC
8. Before YOU BEGIN: Verify Your Components Against HCL
• VMware Virtual SAN Hardware
• Server, Controller, SSD, Disk on HCL
• Controller Firmware, Driver
• Disk Firmware,
• Enclosure Firmware
• SAS/SATA SSD Minimum Firmware is Critical
– Rule is minimum or higher
• NVMe Firmware
– HCL lists absolute version only
CONFIDENTIAL 8
9. Success Tool #1 : Health Plugin – Reactive Health Checks
• Introduced with Virtual SAN 6.0
• Incorporate in the vSphere Web Client
• Virtual SAN Health Check tool include:
– General Health
– Proactive tests
– Virtual SAN HCL health
– Physical disk health
9
• Especially useful when
injecting errors into cluster
and verifying that they
have been remediated
CONFIDENTIAL
10. Success Tool #1 : Health Plugin – Proactive Health Checks
• Proactive tools running
on Virtual SAN cluster
and pre-production tests
– VM Creation test
– Storage Performance
– Multicast performance
test
10CONFIDENTIAL
11. Success Tool #2 : Capacity Views
• Dedupe and
Compression Savings
• Group by Object Type
– Filesystem overhead
– Dedupe overhead
– Checksum overhead
– Virtual disks
– Swap
– Home namespace
11CONFIDENTIAL
12. Success Tool #3 : Performance Service
• Enable it once
• Integrated with vSphere
• Simplified metrics
– Backend (VSAN)
– Frontend (VM)
• Distributed Architecture
– No SPOF
• Historical data
• Status monitored by
health checks
12CONFIDENTIAL
13. Success Tool #4 : HCIbench
• Hyperconverged Infrastructure benchmark
• Based on Vdbench
• Designed to work on distributed architectures
like Virtual SAN
• UI Driven
• Free
• Provides results in both text format, and format
that can be viewed in VSAN Observer
• Now available from
https://labs.vmware.com/flings
13CONFIDENTIAL
14. Success Tool #5 : RVC/Virtual SAN Observer
• Native tools installed on Linux/Appliance and Windows versions of vCenter Server
• Used for Configuration and Status of the Virtual SAN Cluster
• For Performance and Activity monitoring on demand
– VM level
– Host level
– VMDK level
– HDD/SSD Level
• Any anomalies will show up with the metric in question shown in red
• Follow the I/O : VM -> VMDK -> Disk Group -> Disk -> Congestion
14CONFIDENTIAL
15. Success Tool #5 : RVC/Virtual SAN Observer (ctd.)
15
vsan.apply_license_to_cluster
vsan.enable_vsan_on_cluster
vsan.disable_vsan_on_cluster
vsan.clear_disks_cache
vsan.cluster_change_autoclaim
vsan.cluster_set_default_policy
vsan.enter_maintenance_mode
vsan.fix_renamed_vms
vsan.object_reconfigure
vsan.host_wipe_vsan_disks
vsan.recover_spbm
vsan.reapply_vsan_vmknic_config
Cluster
vsan.check_limits
vsan.check_state
vsan.cluster_info
vsan.cmmds_find
vsan.whatif_host_failures
vsan.resync_dashboard
Disk
vsan.disk_object_info
vsan.disks_info
vsan.disks_stats
Host
vsan.host_info
vsan.host_consume_disks
Networking
vsan.lldpnetmap
VM
vsan.vm_object_info
vsan.vm_perf_stats
vsan.vmdk_stats
vsan.obj_status_report
vsan.object_info
Troubleshooting
vsan.support_information
vsan.observer
Virtual SAN Operation Virtual SAN Information
Virtual SAN Monitoring
CONFIDENTIAL
17. PoC Validation
• What are the most important test validation?
1. Successful VSAN configuration
2. Successful VM deployments on VSAN datastore
3. VM Availability in the event of failures (host, storage device, network)
4. VSAN Serviceability (maintenance of hosts, disk groups, disks)
5. VM Performance meets expectations
6. VSAN Data Services (Dedupe, Compression, RAID-5/6, Checksum) working
as expected
17CONFIDENTIAL
18. Case #1 – Successfully VSAN Deployment – Checklist
• Correct vSphere versions
• Appropriate licenses
– especially if PoC is expected to take a long time (> 60 days)
• Correctly Configured Network
– VSAN requires multicast, so prep the network team
• Minimum of three servers
– Or 2 servers plus a witness appliance if doing Remote Office/Branch Office (ROBO)
18
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL
19. Case #1 – Successfully VSAN Deployment – Checklist (ctd.)
• Minimum of three servers contributing
storage:
• At least one storage controller – you’ve checked
the HCL, and drivers and firmware are valid, right?
• At least one flash device (SSD, PCIe) for cache –
check the HCL
• At least one magnetic disk (hybrid) or flash device
(all-flash) for capacity – check the HCL
• Or consider VSAN Ready Nodes as an option …
15
Remember, the VSAN Health Check will do most of this work for you.
CONFIDENTIAL
20. Case #1 – Successfully VSAN Deployment – Device Claiming
• Devices not visible
– Some RAID controllers won’t present individual disks without RAID configuration
– May need RAID-0 configuration set on storage devices via controller
• Devices not being claimed
– Some controllers allow devices to be shared; so devices get presented as “non-local”
– VSAN will only claim devices that are local
• SSD showing up as HDD
– Placing devices in RAID-0 will do this
• All-Flash using wrong devices for cache/capacity
– Set VSAN to “Manual mode” when setting up all-flash
– Gives control over which devices are used for cache and which devices are use for capacity
20CONFIDENTIAL
21. Case #1 – Successfully VSAN Deployment – Overall health
21
Run health
checks after
every test!
Clear Alarms!
Use it to verify a
problem that was
previously
introduced is
now fixed!
Check the Virtual SAN Health Check regularly
CONFIDENTIAL
22. Case #2 : Successful VM Deployment on VSAN
22
Use the Health Check – Proactive Tests to do initial VM deployment check
Part of the Proactive Tests. This
will verify if virtual machines can
be created on VSAN cluster
CONFIDENTIAL
23. Case #2 : Successful VM Deployment on VSAN
23
Component host location
I created a new VM, but where/how is the VM is stored
CONFIDENTIAL
24. Case #3 : VM Availability in the Event of Failures
• Various failures may be introduced as part of a typical POC
– Host failure
– Flash device / Magnetic Disk failure – Cache/Capacity device failures
– Network failure
• Objective: ensure that the VM continues to be available in the event
of a failure. VM maybe restarted on another node in the cluster.
• vSphere HA is fully integrated with Virtual SAN so that virtual machines
on the failed host are restarted on other hosts elsewhere on the cluster
24CONFIDENTIAL
25. Case #3.1 : Host Failures
• How many hosts do I really need?
• A minimum of 3 hosts is needed to support VSAN.
• What about rebuilding after a failure or maintenance mode operations?
• If you want virtual machines to remain highly available on VSAN during these scenarios,
consider configuring for additional capacity i.e. minimum 4 nodes.
25CONFIDENTIAL
26. Case #3.2 : Storage Failures
• The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors
for the purpose of testing.
– A real disk failure results in immediate rebuild activity initiated by VSAN
26
Eject/Offline/Unplug: Absent
Wait 60 minutes before
remediation
Failure: Degraded
Immediate remediation
CONFIDENTIAL
27. Case #3.2 : Storage Failures (ctd.)
• Additional considerations when dedupe/compression are enabled on VSAN
– Deduplication and compression hash tables/metadata are spread across all disks in a disk group
– A single device failure in the disk group will render the whole of the disk group unavailable
– All data in disk group will be rebuilt elsewhere in the cluster (if resources allow)
27
Rebuild Rebuild Rebuild
CONFIDENTIAL
28. Case #3.3 : Network Failure
28
Part of the Proactive Tests. This
will verify if multicast
performance is acceptable can
for VSAN cluster
Multicast configuration is the most common issue
Start simple
If you want feature like
LACP, don’t implement
initially. Turn off QoS/Flow
Control, then build it
afterwards
CONFIDENTIAL
29. Case #3.4 : Validating Rebuild Activity After Failure
• Virtual SAN might need to move data around in the background: change policy, host failure,
long term/permanent component loss, user triggered reconfig, maintenance mode, etc.
• UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync
29
Remember!
Test one
thing at a
time!
CONFIDENTIAL
30. Case #4 : VSAN Serviceability – Maintenance Mode
30
I want to update one of my ESXi host in a VSAN cluster, what do I do ?
VSAN provides multiple options
for maintenance mode
CONFIDENTIAL
31. Case #4 : VSAN Serviceability – Maintenance Mode
31
Ensure Accessibility Full Data Migration No data Migration
Lost of VM compliance Full VM Data compliance No VM availability ensured
Short time maintenance More than one hour of
Maintenance
Short time maintenance
Short Storage preparation Long storage preparation No Impact
Limited Free Storage space
required
Free Storage space requirements
on the other nodes
No Impact
Full
migration
unvailable in
3 node
clusters!
CONFIDENTIAL
32. Case #5 : Management – Disks Serviceability
32
Disk serviceability feature enables identification of to be replaced magnetic disks and flash based
CONFIDENTIAL
33. Case #5 : Management – Disk/Disk Group Evacuation
• Allows you to evacuate data from disk groups and individual disks before removing
a disk/disk group from a Virtual SAN host
• Allows Virtual SAN to ensure all workloads stay fully compliant with their policy!
– Supported in the UI, ESXCLI and RVC.
– Check box in the “Remove disk/disk group” UI screen.
33CONFIDENTIAL
35. New Data Services in VSAN 6.2
• Erasure Coding – RAID-5/RAID-6 Support
• Deduplication / Compression
• Checksum
• IOPS limits / QoS
35
There are performance considerations associated with all of the above.
There are also some issues to be aware of!
CONFIDENTIAL
36. Capacity Overhead of the New Data Services
• Overheads are all calculated in advance
– Deduplication/Compression maintain hash tables
• Approx. 5% overhead
– Checksum Metadata is stored separately from data
• Approx. 1.2 % overhead
CONFIDENTIAL 36
Many customers are surprised by the amount of overhead when data services
are first enabled
37. Data Services File System Overheads – Don’t Panic
• Deduplication and Compression File System Overhead is 5% (approx.) of Total Virtual
SAN Capacity
• Checksum Overhead is approx. 1.2% of capacity
37CONFIDENTIAL
39. How to Test Performance…
• Distributed architecture => best performance when the pooled compute and storage resources
in the cluster are well utilized.
• This usually means a number of VMs each running the specified workload should be distributed
in the cluster and run in a consistent manner to deliver aggregated performance.
• This part of an evaluation can be complex and time-consuming
• Real application workloads are best, but …
– synthetic workloads (IOmeter) might be easier to set up
– simplistic workloads don’t really reflect what Virtual SAN can do
• Worth a read: Pro Tips For Storage Performance Testing
– http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/
39CONFIDENTIAL
40. Performance Testing Considerations (Primarily for Hybrid)
40
Is the test utilising the distributed storage resources of Virtual SAN?
• Multiple VMs across multiple hosts delivers better performance than one VM on one host.
Is the working set fully in cache, utilising flash performance?
• Read-cache misses will incur latency.
Is the workload cache friendly?
• Sustained sequential write workloads fill cache, which must then be destaged. Mixed
R/W workloads with repeat patterns are best.
Is the cache warmed if using VSAN hybrid?
• Initial results from starts of tests will not be reflective of overall performance.
Warning : Make sure dedupe scrubber is disabled. Causes performance issue on hybrid *
* KB 2146267
CONFIDENTIAL
41. Performance Test with HCIbench/vdbench
• VMs will be distributed equally across all hosts
• Select I/O size
• Select R/W ratio
• Select random/sequential
• Select duration of test
• Disks can be zeroed with “dd”*
• VMs will be removed (optionally) when test
completes
• Produces results per VM
– IOPS, Latency, Throughput, etc
• Produces results consumable by VSAN
Observer
41
* Avoid zeroing disks if deduplication enabled – will create hot-spot
CONFIDENTIAL
Introduced in early 2014.
Scale out architecture, starting with small, 3 node cluster and add nodes (ESXi hosts) as needed.
Uses local storage from each host – can have compute nodes but you need at least 3 nodes contributing storage
Full interop with vSphere features.
VM-centric data operations: mirroring, striping, cache reservation, pre-allocate disk space … all done on a per VM basis.
Work most on vSphere HW.
2 VSAN type : all flash/Hybrid
We will also have a number of stretched cluster checks introduced for VSAN 6.1 (shipping with vSphere 6.0U1)
We will also have a number of stretched cluster checks introduced for VSAN 6.1 (shipping with vSphere 6.0U1)
You may also group by data type, so you can see how much space is consumed by primary data, VSAN overhead and temp overhead (rebuild activity for example)
Talk quickly about the example of a RAID-1/FTT=1 VMDK. Front-End will show VM IOPS, and back-end will show 2 x IOPS going to both replicas.
VMware has a VSAN Troubleshooting Guide which can be used for guidance on using VSAN Observer.
VMware has an RVC Reference Guide for VSAN that goes over many of these commands and their usage.
Serviceability – replacing drives, maintenance mode, rolling upgrades
We’re assuming now that you have checked the hardware, and now your ready to do your POC.
Some controllers allows disks to be shared between 2 hosts. If we detect this functionality, the devices are marked as non-local.
Putting individual devices in RAID-0 volume mask a number of features, such as drive type.
What does it check?
That there isn’t some underlying hardware issue preventing a VM from being deployed with a default policy
That you don’t have some silly default policy that cannot be met by the config
That ATS (Atomic Test & Set) locking is functioning – this might have been disabled globally if there is other storage presented to the ESXi hosts (fully supported btw)
This is a VM with FTT=1. VMs with higher spec policies will have more components.
Ask if audience understand that a VM on VSAN is now a set of objects, not files.
Objects in turn or made up of components, which can be many depending on stripe width (RAID0) and failures to tolerate (RAID1)
Speaker notes:
Although we require a minimum of 3 nodes to a VSAN Cluster, a better approach might be to build 4 node clusters.
This way when there is a failure or more importantly a maintenance task which takes one node out of the cluster, you have the possibility of keeping your fault tolerance setting in place during this period, provided there is enough capacity left in the cluster.
Of course, rebuild activity will only occur when there are available resources.
In order to inject errors, the health check includes a feature to do this. It may need 3rd party tools installed.
Devices that are removed are considered “Absent”.
A timeout value defined by ClomdRepairDelay needs to expire before VSAN takes remedial action.
By default, this is 60 minutes.
This means that there is no rebuild activity until this timer expires.
Many ways to simulate a VSAN network failure otherwise:
Pull a cable
Remove uplinks from VSS or DVS
Remove VSAN VMkernel adapter
Known issue documented in VSAN 6.1 release notes http://pubs.vmware.com/Release_Notes/en/vsan/61/vmware-virtual-san-61-release-notes.html
Multicast performance test of Virtual SAN health check does not run on Virtual SAN network
This is due to us using iperf, and iperf always running on vmk0. Workaround to bind multicast address to VSAN network.
This was only visible in RVC in 5.5 – vsan.resync_dashboard. It is now in the UI.
Maintenance Mode places components on the host in an ABSENT state. Don’t do any further testing if a host is in maintenance mode if ensure accessibility or no data migration options chosen. You will basically introduce a double failure, and by default FTT=1.
We’re just going to talk about behaviour with mmode and ensure access (VMs still available, no data migration, loss of one replica) and full data migration. When to use the different methods is part of the day #2 operations talk, not the POC talk.
Keep in mind the requirement to have additional resources. Full Data migration won’t be possible with a 3 node cluster.
Risk in doing maintenance mode with 3 nodes only
Light LED on failures
When a disk hits a permanent error, it can be challenging to find where that disk sits in the chassis to find and replace it.
When SSD or MD encounters a permanent error, VSAN automatically turns the disk LED on.
Turn disk LED on/off
User might need to locate a disk so VSAN supports manually turning a SSD or MD LED on/off.
Marking a disk as SSD
Some SSDs might not be recognized as SSDs by ESX.
Disks can be tagged/untagged as SSDs
Marking a disk as local
Some SSDs/MDs might not be recognized by ESX as local disks.
Disks can be tagged/untagged as local disks.
Warning: When dedupe/compression is enabled on the whole disk group, you cannot remove individual disks.
QoS = Quality of Service
The IO size for IOPS Limit is normalized to 32KB. This means that if you set the IOPS Limit to 10,000 and the typical I/O size from the VM was 64KB, then you could do only 5,000 IOPS. If your block size is 4KB/8KB/16KB or 32KB, you would be able to achieve the 10,000 IOPS limit.
No way to change this normalized I/O size.
Note that this is a hard limit on the number of IOPS so even if there are plenty of resources available on the system to do more, this will prevent the VM/VMDK from doing so.
Deduplication and Compression Overhead approx. 5%
CheckSum overhead anywhere between 1.22 – 1.25
Virtual SAN space reporting feature overview ( https://kb.vmware.com/kb/2144399 )
File system overhead is Virsto
Overhead is calculated in advance, not on-the-fly as the file system is used.
VSAN Stats are not on vCenter – you need RVCor VROPS tools to get those information
A good overview of how to do valid storage performance testing - http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/
A single VM will only consume resources on one host. Deploy multiple VMs.
We aim for 90% read cache hit rate. Of course on All-Flash VSAN, this isn’t an issue since read cache misses are services from flash too.
VSAN is a caching system. The idea is to keep your working set of your application/guest in cache. When considering hybrid storage configurations (e.g. mixed flash and disk), the most important factor will be to estimate the size of your “working set”, e.g. the proportion of your entire data set that will be actively accessed. Most observed working sets are less than 5% of the total dataset size, but there are exceptions. If your tests size your working set too large, you’ll get a less-than-ideal picture of hybrid performance that won’t begin correspond with reality.
Allow your benchmark to run for some time before starting to gather metrics.
If deduplication is enabled, then zero is deduped to one block. All subsequent read tests will hit this single block/disk in the disk group.