Elastic container platforms (like Kubernetes, Docker Swarm, Apache Mesos) fit very well with existing cloud-native application architecture approaches. So it is more than astonishing, that these already existing and open source available elastic platforms are not considered more consequently for multi-cloud approaches. Elastic container platforms provide inherent multi-cloud support that can be easily accessed. We present a solution proposal of a control process which is able to scale (and migrate as a side effect) elastic container platforms across different public and private cloud-service providers. This control loop can be used in an execution phase of self-adaptive auto-scaling MAPE loops (monitoring, analysis, planning, execution). Additionally, we present several lessons learned from our prototype implementation which might be of general interest for researchers and practitioners. For instance, to describe only the intended state of an elastic platform and let a single control process take care to reach this intended state is far less complex than to define plenty of specific and necessary multi-cloud aware workflows to deploy, migrate, terminate, scale up and scale down elastic platforms or applications.
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Container Platforms
1. Smuggling Multi-Cloud Support
into Cloud-native Applications
using Elastic Container Platforms
1
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
Nane Kratzke
2. The next 30 minutes are about ...
• What are Cloud-native Applications?
• Elastic Container Platforms and why they
should be considered for multi-cloud research.
• A control loop to scale Elastic Container
Platforms across Cloud Service Providers
• Some data of our evaluation
• 7 Lessons Learned and Conclusion
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
2
Presentation URL
Paper URL
3. Maturity Criteria
3
Cloud
Native
• Application can dynamically migrate across infrastructure
providers without interruption of service.
• Application can elastically scale out/in appropriately based on
stimuli.
2
Cloud
Resilient
• Services are stateless.
• Application is unaware and unaffected by failure of dependent services.
• Application is infrastructure agnostic and can run anywhere.
1
Cloud
Friendly
• Application is composed of loosely coupled services.
• Application services are discoverable by name.
• Application deployment units are designed according to cloud patterns
(e.g. 12-factor app principles)
• Application compute and storage are separated.
• Application consumes one or more cloud services: compute, storage,
network.
0
Cloud
Ready
• Application runs on virtualized infrastructure.
• Application can be instantiated from an image or script.
According to OPEN DATA CENTER ALLIANCE Best Practices (Architecting Cloud-Aware
Applications), 2014
with add-ons by practitioner Mario-Leander Reimer (QAWare)
Cloud Application Maturity Model (CAMM)
Covered by
a lot of
SOA and
cloud
deployment
approaches.
This contri-
bution‘s
focus ...
4. Research Surveillance of Practitioners
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
4
Docker Swarm
Swarm Mode (since
Docker 1.12) „copies“ the
idea of Kubernetes-like
control processes but
integrates them in just one
component. Secure by
default (control and data
plane). Hides operation
complexity.
Google
Control processes that
continuously drive current state
of container based applications
towards an intended desired
state. Makes Google‘s
experience of running large
scale production workloads
available as open source
(especially from the Google
internal Borg system).
Mesosphere
Apache Mesos based
datacenter operating system
for fine grained resource
allocation. Frameworks to
operate containers and data
services. Datacenter focused.
Mesos operates successfully
large scale datacenters since
years (Twitter, Netflix, ...)
Practitioners ask for simple solutions (elastic platforms) ...
5. The very basic idea ...
Prof. Dr. rer. nat. Nane Kratzke
Praktische Informatik und betriebliche Informationssysteme
5
Operate application on current provider.
Scale cluster into prospective provider.
Shutdown nodes on current provider.
Cluster reschedules lost container.
Migration finished.Quint, P.-C., & Kratzke, N. (2016). Overcome Vendor Lock-In by
Integrating Already Available Container Technologies - Towards
Transferability in Cloud Computing for SMEs. In Proceedings of CLOUD
COMPUTING 2016 (7th. International Conference on Cloud Computing,
GRIDS and Virtualization).
Avoiding Vendor Lock-In:
• Make use of elastic container
platforms to operate elastic
services being deployable to any
IaaS cloud infrastructure.
• Transfer of these services from one
private or public cloud infrastructure
to another would be possible at
runtime.
6. But the idea provides more options ...
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
6
Simply stop „a transfer“ somewhere in between and you get ...
7. One Control Loop
for All
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
7
Operate application on current provider.
Scale cluster into prospective provider.
Shutdown nodes on current provider.
Cluster reschedules lost container.
Migration finished.
8. Control Loop
Example to deploy a cluster
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
8
Definition of an intended state.
{
"type": "cluster",
"platform": "Swarm",
"deployments": [
{
"district": "gce-europe",
"flavor": "small",
"role": "master",
"quantity": 1
},
{
"district": "gce-europe",
"flavor": "small",
"role": "worker",
"quantity": 9
},
{
"district": "aws-europe",
"flavor": "small",
"role": "worker",
"quantity": 0
}
]
}
9. Control Loop
Example to deploy a cluster
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
9
Derive a prioritized action list.
|| Create secgroup for gce-europe
-- Create master in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| Create worker in gce-europe
|| executed in parallel
-- executed sequentially
10. Control Loop
Example to deploy a cluster
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
10
Updated resources.
- Secgroup for gce-europe
- Master node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
All detail data like IP-adresses,
identifiers, etc. omitted for better
readability.
11. - Secgroup for gce-europe
- Master node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
Control Loop
Example: Transfer of five worker nodes
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
11
{
"type": "cluster",
"platform": "Swarm",
"deployments": [
{ "district": "gce-europe",
"flavor": "small",
"role": "master",
"quantity": 1
},
{ "district": "gce-europe",
"flavor": "small",
"role": "worker",
"quantity": 9
},
{ "district": "aws-europe",
"flavor": "small",
"role": "worker",
"quantity": 0
}
]
}
4
5
|| Create secgroup for aws-europe
|| Create worker in aws-europe
|| Create worker in aws-europe
|| Create worker in aws-europe
|| Create worker in aws-europe
|| Create worker in aws-europe
-- Delete worker in gce-europe
-- Delete worker in gce-europe
-- Delete worker in gce-europe
-- Delete worker in gce-europe
-- Delete worker in gce-europe
|| executed in parallel
-- executed sequentially
- Secgroup for gce-europe
- Secgroup for aws-europe
- Master node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in gce-europe
- Worker node in aws-europe
- Worker node in aws-europe
- Worker node in aws-europe
- Worker node in aws-europe
- Worker node in aws-europe
12. Resulting Architecture (Domain Model)
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
12
Extension point
for elastic
platforms
Currently supported:
Kubernetes, Swarm
Extension point for IaaS
infrastructures
Currently supported: AWS, GCE,
Azure, OpenStack
13. Evaluation:
5 Experiments (with a 1 Master and 9 Worker Cluster)
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
13
OpenStack
Google Compute Engine
(GCE, n1-standard-2)
Elastic Compute Cloud
(EC2, m3.large)
E1
E2 E2
E1
E3, E4, E5
E3, E4, E5
The same experiments have
been done with OpenStack
as well.
E1: Launch a 10 node cluster.
E2: Terminate a 10 node cluster.
E3: Transfer one node of the cluster.
E4: Transfer 5 nodes of the cluster.
E5: Transfer all nodes of the cluster.
Cluster was Docker Swarm (operated a Sock Shop
Reference Application and a Redis-based Guestbook)
Kubernetes
Different elastic container
platforms had no significant
impact on the runtimes.
Therefore data is only
presented for Docker Swarm.
Docker Swarm
14. Evaluation (Single Cloud)
Deploying and terminating clusters
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
14
Experiment E1
Experiment E2
10 times longer ???
15. Evaluation (Multi-Cloud)
Transfer GCE ⇠⇢ AWS
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
15
Experiment
E3
Experiment
E4
Experiment
E5
Comparable with a
shutdown.
Node termination
times seem to
dominate the
transfer times
massively.
16. Why these (dramatic) differences?
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
16
Analysis turned out:
1. GCE API works
synchronously (a node
termination call blocks until
termination is completed)
2. AWS API works
asychronously (so node
termination call did not block
until termination completed,
fire and forget)
3. GCE SDN related
processing times take far
longer than AWS SDN
related processing times.
17. Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
17
18. Conclusion
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
18
• Elastic container platforms provide often overlooked multi-cloud opportunities
• We could succesfully demonstrate multi-cloud transfers between AWS, GCE,
Azure and OpenStack using a simple control loop (scaling Kubernetes and
Docker SwarmMode).
• The control loop is designed to be integratable in a MAPE loop as execution
phase.
• A cybernetic understanding (intended state vs. current state) makes a lot of
multi-cloud workflows easier.
• On the downside: The solution is limited to container-based applications (CNMM
Level 3) and services (but that seems to become a dominating architectural
style).
• New research opportunities and future research directions:
• Making the solution available as Open Source
• P2P-based elastic platforms would make deployments even easier (no worker/master
roles)
• There is room for improvements (e.g. resource efficient action planning)
19. Acknowledgement
• Elastic Straps: Pixabay (CC0 Public Domain, PublicDomainPictures)
• Definition: Pixabay (CC0 Public Domain, PDPics)
• Class room: Pixabay (CC0 Public Domain, Unsplash)
• Railway: Pixabay (CC0 Public Domain, Fotoworkshop4You)
• Air Transport: Pixabay (CC0 Public Domain, WikiImages)
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
19
Picture Reference
This research is funded by German Federal Ministry of Education
and Research (03FH021PX4). I would like to thank Peter Quint,
Christian Stüben, and Arne Salveter for their hard work and their
contributions to the Project Cloud TRANSIT.
Presentation URL
Paper URL
20. About
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
20
Nane Kratzke
CoSA: http://cosa.fh-luebeck.de/en/contact/people/n-kratzke
Blog: http://www.nkode.io
Twitter: @NaneKratzke
GooglePlus: +NaneKratzke
LinkedIn: https://de.linkedin.com/in/nanekratzke
GitHub: https://github.com/nkratzke
ResearchGate: https://www.researchgate.net/profile/Nane_Kratzke
SlideShare: http://de.slideshare.net/i21aneka
21. Backup Slides
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
21
22. Elastic Platforms and Multi-cloud
requirements
Multi-Cloud Requirements Contributing Platform concepts
Transferability Integration of nodes into one logical cluster
Designed for failure
Cross-provider deployable
Data location awareness Pod concept (Kubernetes)
Volume orchestrator (Flocker for Docker)
Geolocation awareness Tagging of nodes with geolocation, pricing, policy or
on-premise informations
Platform schedulers have selectors (Swarm) /
affinitities (Kubernetes) / constraints
(Mesos/Marathon) to evaluate these taggings
Pricing awareness
Legislation/policy awareness
Local resources awareness
Security requirements Encrypted data / control plane (Swarm)
Encrypted overlay networks (e.g. Weave for
Kubernetes)
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
22
Several transferability, awareness and security requirements come along with
multi-cloud approaches. Already existing elastic container platforms contribute
to fulfill these requirements.
23. Cloud-native Application
What?
Be IDEAL
• Isolated State
• Distributed
• Elastic
• Automated
management
• Loosely coupled
Why?
There is a need for ..
• Speed (delivery)
• Safety (fault tolerance,
design for failure)
• Scalability
• Client diversity
How?
Integrate ...
• (Micro)service oriented
architectures (M)SOA
• Use API-based
collaboration
• Consider cloud-focused
pattern catalogues
• Use self-service agile
platforms
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
23
C. Fehling, F. Leymann, R. Retter, W.
Schupeck, and P. Arbitter, Cloud
Computing Patterns: Fundamentals
to Design, Build, and Manage Cloud
Applications. Springer, 2014.
M. Stine, Migrating to Cloud-Native
Application Architectures. O’Reilly,
2015
A. Balalaie, A. Heydarnoori, and P.
Jamshidi, “Migrating to Cloud-Native
Architectures Using Microservices”,
CloudWay 2015, Taormina, Italy
S. Newman, Building Microservices.
O’Reilly, 2015.
Often heard by practitioners: „A cloud-native application is an
application intentionally designed for the cloud.“ True, but
helpful?
24. Cloud-native Application Definition
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
24
[KQ2017a] Kratzke, N., & Quint, P.-C. (2017). Understanding Cloud-native Applications after 10 Years of
Cloud Computing - A Systematic Mapping Study. Journal of Systems and Software, 126 (April).
25. We need some guidance ...
ClouNS – Cloud-native Application Reference Model
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
25
[KP2016] Kratzke, N., & Peinl, R. (2016). ClouNS - a Cloud-Native Application Reference Model for Enterprise Architects. In 2016
IEEE 20th International Enterprise Distributed Object Computing Workshop (EDOCW) (pp. 1–10).
26. Did you know?
Prof. Dr. rer. nat. Nane Kratzke
Praktische Informatik und betriebliche Informationssysteme
26
2 2
2 4 6
7
7
7 7 11 11
1 1
2 4 7
10
14
21 26 42 44
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Relation of considered services
considered by CIMI, OCCI, CDMI, OVF, OCI, TOSCA not considered
Cloud standards improved over the last 10
years. However, cloud standardization coverage
decreased (in relation to all available services).
Analyzed using over 2300 offical release notes of Amazon Web
Services (AWS). Data for other providers like Google, Azure,
Rackspace, etc. not presented. Basic conclusions for these
providers are the same.
[KQP+2016] Kratzke, N., Quint, P.-C., Palme, D., &
Reimers, D. (2016). Project Cloud TRANSIT - Or to
Simplify Cloud-native Application Provisioning for
SMEs by Integrating Already Available Container
Technologies. In V. Kantere & B. Koch
(Eds.), European Project Space on Smart Systems, Big
Data, Future Internet - Towards Serving the Grand
Societal Challenges.
27. Research Methodology
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
27
Main focus
of this
contribution
CNA == Cloud-native Application
28. Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
28
Evaluation:
Virtual Machine Type Selection
[KQ2015] Kratzke, N., & Quint, P.-C. (2015). About Automatic Benchmarking of IaaS Cloud Service
Providers for a World of Container Clusters. Journal of Cloud Computing Research, 1(1), 16–34.
We searched for the most similar machine types of different public cloud service
providers. The similarity indicator maps processing, memory, network, and disk I/O
performance to just one similarity value (1 means identical, 0 means no similarity at all).
29. This reference model guides our
research
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
29
Developing a description language for cloud-native applications.
Developing a standardized way of deploying a clustered container runtime
environment for cloud-native applications
(CNMM Level 3 conform deploying/operation)
Make use of commodity services of public cloud service providers
only (IaaS).
30. Research Surveillance of Practitioners
Prof. Dr. rer. nat. Nane Kratzke
Computer Science and Business Information Systems
30
Practitioners often prefer layer-based reference models ...
Jason Lavigne, ”Don’t let aPaaS you by - What is aPaaS and why
Microsoft is excited about it”, see
https://atjasonunderscorelavigne.wordpress.com/2014/01/27/dont-let-
apaas-you-by/ (last access 4th August 2016)
Johann den Haan, ”Categorizing and Comparing the Cloud Landscape”,
see http://www.theenterprisearchitect.eu/blog/categorize-compare-cloud-
vendors/ (accessed 4th August 2016)
Josef Adersberger, Andreas Zitzelsberger,
Mario-Leander Reimer, ”Der Cloud-Native-
Stack: Mesos, Kubernetes und Spring Cloud”,
see
http://www.qaware.de/fileadmin/user_upload/QA
ware-Cloud-Native-Artikelserie-Java_Magazin-
1.pdf (accessed 4th August 2016)
MEKUNSCloud Landscape Model