Catch Me If You Can - Cloud Foundry Summit Europe 2016
Outrunning Environmental Adversity with Intelligence at all layers of the OSI Model. Keith Strini, Merlin Glynn, Sean Keery
08448380779 Call Girls In Friends Colony Women Seeking Men
Catch Me If You Can - Cloud Foundry Summit Europe 2016
1. Catch Me If You Can
Outrunning Environmental Adversity with Intelligence
at all layers of the OSI Model
2. Introductions
• Merlin Glynn mglynn@pivotal.io
• Sean Keery skeery@pivotal.io
• Keith Strini kstrini@pivotal.io
• Special Shout out to Raymond Lee
(BDS Team)
3. What if we could improve performance
& respond to environmental adversity?
APT - A set of stealthy and continuous computer hacking processes,
often orchestrated by human(s) targeting a specific entity.
DDOS – an attempt to make a machine or network resource
unavailable to its intended users, such as to temporarily or
indefinitely interrupt or suspend services of a host connected to the
Internet
Spectrum of cyber vulnerability from DDOS to APT
Quality of service
The overall performance of a computer network, particularly
the performance seen by the users of the network. To
quantitatively measure quality of service, several related
aspects of the network service are often considered, such as
error rates, bit rate, throughput, transmission delay,
availability, jitter, etc.
4. • Bosh
• CF
• SDN
DSL
Agents:
Actual State
Strategy:
Desired
State
Dynamic
Analysis
Environment
Models
Goals
(SLA)
Predictions
(Metrics)
DSL
Library
Learning
What this Continuous Improvement
over Environmental Adversity looks
like..…
responses
DSLDSL
DSLDSLDSL
Realize
<<no-outage>>
<<predictive>>
<<reliable>>
6. Use Cases for Demo
• DDOS -> Recognize foreign IP/Add ACL via NSX Rest API
• QoS -> Detect network throughput deficiency/Add 1 .. N routes
• APT -> Recognize foreign IP+Load/Alert Forensics Team
Spin up new CF foundation/subnet/data subnet access
Add new route
Remove forensic route
Shutdown data subnet access from forensic foundation
Goal: uninterrupted production traffic/UX
• DDIL -> Detect network throughput/Identify best cell net throughput
Move highest priority workloads to cell
Add additional service chaining IAW compliancy outlines to
edge
Add 1..n routes
7. Where do we go from here ….
• Ways to evaluate each
deployment
• Utilize the inherent abilities of
the distributed architecture
• Machine learning where each
distributed component
maintains state, manages itself
8. In Conclusion…Why Now?
• The cyber vulnerability problem is
imminent
• The operations, networking and
development teams are finally
becoming cohesive units
• The capacity to process, interpret
and act upon petascale data on any
IaaS
• All of this is can already be built into
the very core of the foundation now
– (Diego abstractions, SDN API, Predictive
and ML, Streams, Bosh-Enaml).
Bullet -> Reactive to Proactive
Detecting and analyzing the running behavior
Predicting the effect different strategic actions would have on the distributed system when real problems are detected.
Bullet -> Intermittency is too fast
How often data should be observed
The criticality of which data should be sent through intermittent connections.
Bullet -> Difficult to exploit opportunities
Manual Bosh management of virtualized resources in a server cluster across any IaaS.
The overall effect of degraded communications affecting degraded performance in managing the system
The more intermittent the communication, the greater the effect on Bosh’s management performance
Bullet -> Co-deploying
Co-deploying analytics and the analytics platform within the foundation.
System metrics via the fire hose transmits this data from the foundation
Interpret those metrics and then select strategies defined in foundation relevant DSLs that provides courses of action (COAs) on how the network it rides on, should adapt to changes.
Bullet -> Bosh Adds SDN Components
Allow Bosh to add new SDN components dynamically through a process of discovery Bosh continuously would monitor.
New components would need to be added to allow Bosh to dynamically reconfigure by providing metadata about those individual components.
As the degraded performance was detected and the ability to interact with the degrading foundation fell below desired thresholds, Bosh would execute strategies to heal, adapt, optimize, and defend the system against similarly future degradations.
Bullet-> Liberating the control plane from the data plane, SDN enables the foundation to truly adapt (at almost all layers of the OSI model) to changing environment and threat circumstances.
“Continuous Advantage” makes it difficult to cause substantial damage without launching a full assault against the infrastructure.
To take advantage of this inherent resiliency, we must evolve the ability to take advantage of the dynamic possibilities of SDN into the next phase, coupling it with Predictive Analytics and ML to fully optimize and self-protect the enterprises we are in charge of running.
Bullet -> Evaluate the deployment
Determine violations of constraints that were defined for the specific foundation.
If anomalies are detected or SLAs violated, programmatically adapt the architecture.
DSL based strategies matched and evaluated to determine the best approach to solving the SLA or mitigating the anomaly.
Choose the strategy to execute that effects changes to the foundation.
System metrics via the fire hose transmits this data from the foundation
Interpret those metrics and then select strategies defined in foundation relevant DSLs that provides courses of action (COAs) on how the network it rides on, should adapt to changes.
Bullet -> Utilize the distributed architecture
Each distributed component would have a state in which it could manage itself
Once connectivity was restored, report a total system health back to Bosh.
No matter how disconnected the systems become, the foundation would still be able to function.
The challenge is the limitation on globally optimizing the foundation to holistically address performance degradation.
Bullet -> Machine Learning
Size and complexity of Cloud Foundry enterprises are beginning to outstrip the ability of humans to understand and control the maintenance
The speed required for effective network optimization is ever increasing. In particular this degradation of performance by predicting the future communication states of the system.
To address the effects of degraded communications, metrics could be queried against real time analytics to predict the future state of the system.
The predicted constraints could then reason on future state values. In this way, Bosh can anticipate events that would require adaptation and issue commands before they are needed.
This pre-emptive strategy would allow the foundation to operate in degraded environments by issuing adaptive strategies in communication windows before they are needed.
Such learning approaches can also be applied to make decisions based upon currently monitored states and can be used to detect anomalous operation such as hidden APTs, among other capabilities.