Slide presentation from Webinar on February 17, 2016.
People in analytical roles are demanding more and more compute and storage to get their jobs done. Instead of building out infrastructure for a few employees or a department, systems engineers and IT managers can find value in creating a compute stack in the cloud to meet the fluctuating demand of their clients.
In this 45-minute webinar, you’ll learn:
- How to identify the right analytical workloads
- How to create a scalable compute environment using the cloud for analysts in under 10 minutes
- How to best manage costs associated with the cloud compute stack
- How to create dedicated client stacks with their own scratch space as well as general access to reference data
Health systems departments, research & development departments, and business analyst groups all face silos of these challenging, compute-intensive use cases. By learning how to quickly build this flexible workflow that can be scaled up and down (or off) instantly, you can support business objectives while efficiently managing costs.
4. Agenda
• Highlight challenges faced by today’s IT organizations, especially
with analytics teams, when dealing with public clouds
• Focus for today largely on compute and data
• Discuss how to meet these challenges
• How to create a scalable compute environment in under 10 minutes
• How to leverage data both in and outside the public cloud
6. Clouds Can Be Easy to Use
AWS EC2 Compute
Google GCP Compute
Microsoft Azure Compute
7. Each Cloud Offers
Clouds Can Do Many Things
• Virtual Machines/Compute
• Containers
• Storage
• Databases (various)
• Networking
• Tiered Applications
• Big Data Processing
• And more…too many to mention
in a slide
8. Overall Benefits of Cloud, Tools and Integrators
• Cloud platform reduces fulfillment time for new resources
• Cloud platform removes permanence from resource allocation
• Cloud platform removes cost from resource allocation (CAPEX)
• Cloud platform increases capacity and flexibility
• Cloud services and tools decrease complexity and cost of
ownership
9. In Fact It’s So Easy ….
…End users can set things up themselves.
10. Common End User Comments
• “I can’t wait for IT to give me
resources.”
• “I don’t have to wait for IT to
give me resources.”
• “There are too many
requirements to use IT
resources…I’ll just go to (enter
public cloud name here).”
11. Liberating, but Still Liable
• Corporate or Institutional data
• Spending on behalf of corporation or institution equates to direct liability
• Security concerns remain, even if the environment is self-contained
• Costs can spiral out of control; budgets may not account for these
spending events
12. Cloud - Extension of IT Resources
• Budget chargeback
• Networking(!)
• Security (of users, of data)
• Resource fulfillment
• Capacity planning (for budgets)
13. With the Right Tools, IT Can Make Cloud Magic
• On-demand services with automated chargeback
• Extension of existing automation capabilities
• Rapid allocation of new compute without CAPEX costs
• Significantly reduced fulfillment
– From order, ship, unbox, rack & stack to “run automation”
16. Cloud Compute Use Case Examples
• Analytical processing (either single or multi machine use cases)
– Life Sciences Analytics / Quality Check (QC) / SNP analysis
applications
– Financial Risk Modeling
– Rendering and Transcoding activities
• Build/Test environments
• Big Data applications such as Hadoop
• Application servers/services
• Or simply workstations on demand for temporary use
– Example: Amazon Workspaces
17. Cloud Compute Usage Examples
Cloud Compute
100% Cloud Compute
Local/SSD Storage
Cloud Storage 100% Cloud Compute
Local/SSD Storage
Cloud Storage
Cloud Compute
Cloud Compute
On-Premises NAS
WAN
100% Cloud Compute
Local/SSD Storage
On-Premise Data over WAN
Cloud Compute
WAN
On-Premises NASOn-Premises Compute
Extended Compute (Burst) into Cloud
Local/SSD Storage
On-Premise Data over WAN
18. Data Considerations
• Considerations:
– Is there a lot of data?
– Are there multiple nodes acting on the data?
– Is there to be a lot of writing (versus reading) of data?
– Is the data sensitive?
– Is there a scratch space requirement?
– Will the data need to persist in the cloud?
19. Choices for Your Data
• Copy to local SSD or Persistent SSD/EBS on each node
• Locate / migrate data to object store bucket in cloud provider
• Run a file system in the compute environment and serve data as a
NAS
• Use a caching layer in the compute environment and serve only
requested data, leaving the data wherever it originated
20. Avere vFXT – Caching File System in the Cloud
• Avere vFXT:
– Highest performance
– Scale-out NAS
– Ideal for high core-count applications and large numbers of servers
– Global namespace: one mount for various sources, including cloud and
on-premises data
– Scale up and down as demand requires
– Only obtains data that has been requested by clients
– Ideal for cloud bursting on-premises data to cloud compute
– Scale = 10s of 1000s of cores
21. Avere CloudFusion: NAS-in-the-Cloud
• Avere CloudFusion
– Single-node, low cost caching NAS
– Uses low-cost s3 storage as the storage
• Store significant data
– Presents NFS or SMB
– Supports multiple clients
• For example, use it as your AWS Workspaces storage
– Use as scratch space
• Simple to configure
22. Advantages of a caching layer in compute
• No persistent data in compute = lower cost
• Achieve high performance at low latencies
• Maintain data security by leaving it on-premises
• Abstract data sources between on-premises and cloud for a single
file system experience
• Reduce complexity of compute environment by avoiding re-write of
any applications
24. Deployment of Application Stack
• Among the many ways, we’ll start with those provided by the cloud
providers themselves
• For compute, choose:
– A pre-configured image (AMI, VM) with all necessary software
– Multiple pre-configured images with all necessary software
– Pre-configured images using Puppet or other CM tool for updates
– A container, set of containers in a cluster
• For networking, choose:
– A configured VPN (for internet-based connectivity)
– Cloud Provider peering connections
– Direct connectivity through companies like Equinix
– Security Group / Firewall / route configurations
25. Deployment of Application Stack (continued)
• For security, choose:
– IAM in the public cloud
– Service accounts / roles to restrict what the compute nodes can access
• For data, choose:
– A caching / file system application
– Program to copy / move data to the local nodes, triggered as part of the
stack creation
26. The 10-Minute Stack
• AWS: CloudFormation Template (JSON / REST)
• Google Launcher / Deployment Manager Templates (YAML, Python)
• Microsoft Azure Resource Manager (JSON / REST)
Each offer significant
examples on their respective
sites.
For AWS, wrappers such as
Terraform and Troposphere
reduce the complexity.
27. What You’ll Need
• Command-line tools (aws cli, gcloud, powershell)
• Text editor / code editor
• A Project / VPC / Network in the respective cloud
– Assume that you will create multiple stacks but within an existing
infrastructure framework
– Use the commands and python/etc. to validate the network and security
environments
• Image (AMI/Virtual Machine) or configuration management (e.g.,
Chef) for application image creation
• File System capability…we’ll use Avere
– You’ll need python coding for this piece
30. What Will You Create with the Templates?
• All of the necessary security (if not exists)
– For example, if you require that your instances access object storage,
then permission will need to be granted to the instance either directly (in
Google’s case) or via IAM role (for AWS)
• Disks (volumes) for the machines (if using persistent)
• Network routes for new addresses or network/subnets
• Compute instances
• UserData can then be included in the templates to call extra
configuration on the instances
31. Deploying Avere with the Stack
• Leverage CloudFormation / Deployment Manager / Resource
Manager to set up the initial nodes
• Add checks to ensure networking is configured properly
– Cloud provider endpoint access is critical
• GCS/S3 API endpoint for storage, EC2 or GCE endpoint for controlling IP
address failover for vFXT
• Call XML-RPC library to complete configuration of
– “Core filer” mappings
– Client IP address configuration
– Integration with AD or NIS
– Configuration to on-premises NFS server
32. End State
Avere vFXT in Compute
WAN
On-Premises NAS
Application Node
Application Node
Application Node
Application Node
Validated Network
AWS: VPC
GCP: Project Network
Azure: Virtual Network
vFXT configured with
IP Addresses
DNS, NTP
Mapping to on-
premises NAS
Export for Global
Namespace
NAT / Proxy / VPN /
Router
Application nodes have
a mount point
configured based on the
Avere vFXT Export
addresses
IAM Roles applied
33. Summary
• Cloud Tools abound for creating on-demand application stacks in
your favorite cloud
• IT organizations can leverage these clouds and tools to maximize
their customers’ capabilities and thus their satisfaction
• Leverage caching file systems running in the cloud to provide
performance-based access to only relevant data, limiting the need
to move large amounts of data into the cloud temporarily