SlideShare a Scribd company logo
1 of 83
Download to read offline
Microservices: What’s Missing?
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
April 2016
What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical
Advice for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain
Relationship with
Cloud Vendors
Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
http://www.infoq.com/presentations/Twitter-Timeline-Scalability
http://www.infoq.com/presentations/twitter-soa
http://www.infoq.com/presentations/Zipkin
http://www.infoq.com/presentations/scale-gilt
Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk
http://www.infoq.com/presentations/circuit-breaking-distributed-systems
https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014
State of the Art in Web Scale
Microservice Architectures
AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0
Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w
Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs
New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY
Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU
http://www.infoq.com/presentations/spring-cloud-2015
Microservice Architectures
ConfigurationTooling Discovery Routing Observability
Development: Languages and Container
Operational: Orchestration and Deployment Infrastructure
Datastores
Policy: Architectural and Security Compliance
Microservices
Edda
Archaius
Configuration
Spinnaker
SpringCloud
Tooling
Eureka
Prana
Discovery
Denominator
Zuul
Ribbon
Routing
Hystrix
Pytheus
Atlas
Observability
Development using Java, Groovy, Scala, Clojure, Python with AMI and Docker Containers
Orchestration with Autoscalers on AWS, Titus exploring Mesos & ECS for Docker
Ephemeral datastores using Dynomite, Memcached, Astyanax, Staash, Priam, Cassandra
Policy via the Simian Army - Chaos Monkey, Chaos Gorilla, Conformity Monkey, Security Monkey
@adrianco
Discussion Points
Failure injection testing
Versioning, routing
Protocols and interfaces
Timeouts and retries
Denormalized data models
Monitoring, tracing
Simplicity through symmetry
@adrianco
Failure Injection Testing
Netflix Chaos Monkey and Simian Army
http://techblog.netflix.com/2011/07/netflix-simian-army.html
http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html
http://techblog.netflix.com/2016/01/automated-failure-testing.html
! Chaos Monkey - enforcing stateless business logic
! Chaos Gorilla - enforcing zone isolation/replication
! Chaos Kong - enforcing region isolation/replication
! Security Monkey - watching for insecure configuration settings
! Latency Monkey & FIT - inject errors to enforce robust dependencies
! See over 100 NetflixOSS projects at netflix.github.com
! Get “Technical Indigestion” reading techblog.netflix.com
Trust with Verification
@adrianco
Benefits of version aware routing
Immediately and safely introduce a new version
Canary test in production
Use feature flags n
Route clients to a version so they can’t get disrupted
Change client or dependencies but not both at once
Eventually remove old versions
Incremental or infrequent “break the build” garbage collection
@adrianco
Versioning, Routing
Version numbering: Interface.Feature.Bugfix
V1.2.3 to V1.2.4 - Canary test then remove old version
V1.2.x to V1.3.x - Canary test then remove or keep both
Route V1.3.x clients to new version to get new feature
Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients
V1.x.x to V2.x.x - Route clients to specific versions
Remove old server version when all old clients are gone
@adrianco
Protocols
Measure serialization, transmission, deserialization costs
“Sending a megabyte of XML between microservices will
make you sad…”
Use Thrift, Protobuf/gRPC, Avro, SBE internally
Use JSON for external/public interfaces
https://github.com/real-logic/simple-binary-encoding
@adrianco
Interfaces
When you build a service, build a “driver” client for it
Reference implementation error handling and serialization
Release automation stress test using client
Validate that service interface is usable!
Minimize additional dependencies
Swagger - OpenAPI Specification
Datawire Quark adds behaviors to API spec
@adrianco
Interfaces
@adrianco
Interfaces
Client
Code
Object
Model
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Object
Model
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Object
Model
Cache
Code
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Service
Driver
Service
Handler
Object
Model
Cache
Code
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Service
Handler
Object
Model
Cache
Code
Cache
Handler
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfacesDecoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfaces
Versioned
platform
interface
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfaces
Versioned
platform
interface
Decoupled
object
models
Versioned routing
@adrianco
Interface Version Pinning
Change one thing at a time!
Pin the version of everything else
Incremental build/test/deploy pipeline
Deploy existing app code with new platform
Deploy existing app code with new dependencies
Deploy new app code with pinned platform/dependencies
@adrianco
Timeouts and Retries
Connection timeout vs. request timeout confusion
Usually setup incorrectly, global defaults
Systems collapse with “retry storms”
Timeouts too long, too many retries
Services doing work that can never be used
@adrianco
Connections and Requests
TCP makes a connection, HTTP makes a request
HTTP hopefully reuses connections for several requests
Both have different timeout and retry needs!
TCP timeout is purely a property of one network latency hop
HTTP timeout depends on the service and its dependencies
connection path
request path
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
First request from Edge timed out so it ignores the successful
response and keeps retrying. Middle service load increases as
it’s doing work that isn’t being consumed
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
First request from Edge timed out so it ignores the successful
response and keeps retrying. Middle service load increases as
it’s doing work that isn’t being consumed
@adrianco
Timeout and Retry Fixes
Cascading timeout budget
Static settings that decrease from the edge
or dynamic budget passed with request
How often do retries actually succeed?
Don’t ask the same instance the same thing
Only retry on a different connection
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, one retry
Failed
Service
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, one retry
Failed
Service
3s
1s
1s
Fast fail
response
after 2s
Upstream timeout must always be longer than
total downstream timeout * retries delay
No unproductive work while fast failing
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, failover retry
Failed
Service
For replicated services with multiple instances
never retry against a failed instance
No extra retries or unproductive work
Good
Service
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, failover retry
Failed
Service
3s 1s
For replicated services with multiple instances
never retry against a failed instance
No extra retries or unproductive work
Good
Service
Successful
response
delayed 1s
@adrianco
Manage Inconsistency
ACM Paper: "The Network is Reliable"
Distributed systems are inconsistent by nature
Clients are inconsistent with servers
Most caches are inconsistent
Versions are inconsistent
Get over it and
Deal with it
@adrianco
Denormalized Data Models
Any non-trivial organization has many databases
Cross references exist, inconsistencies exist
Microservices work best with individual simple stores
Scale, operate, mutate, fail them independently
NoSQL allows flexible schema/object versions
@adrianco
Denormalized Data Models
Build custom cross-datasource check/repair processes
Ensure all cross references are up to date
Immutability Changes Everything
http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html
Memories, Guesses and Apologies
https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/
Cloud Native
Monitoring and
Microservices
Cloud Native Microservices
! High rate of change
Code pushes can cause floods of new instances and metrics
Short baseline for alert threshold analysis – everything looks unusual
! Ephemeral Configurations
Short lifetimes make it hard to aggregate historical views
Hand tweaked monitoring tools take too much work to keep running
! Microservices with complex calling patterns
End-to-end request flow measurements are very important
Request flow visualizations get overwhelmed
Low Latency SaaS Based Monitors
https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com
See www.battery.com for a list of portfolio investments
Managing Scale
A Possible Hierarchy
Continents
Regions
Zones
Services
Versions
Containers
Instances
How Many?
3 to 5
2-4 per Continent
1-5 per Region
100’s per Zone
Many per Service
1000’s per Version
10,000’s
It’s much more challenging
than just a large number of
machines
Flow
Some tools can show
the request flow
across a few services
Interesting
architectures have a
lot of microservices!
Flow visualization is
a big challenge.
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
Simulated Microservices
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Eventually stress test real tools
Code: github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3
See for yourself: http://simianviz.surge.sh
Follow @simianviz for updates
ELB Load Balancer
Zuul
API Proxy
Karyon
Business Logic
Staash
Data Access Layer
Priam
Cassandra Datastore
Three
Availability
Zones
Denominator
DNS Endpoint
Spigo Nanoservice Structure
func Start(listener chan gotocol.Message) {
...
for {
select {
case msg := <-listener:
flow.Instrument(msg, name, hist)
switch msg.Imposition {
case gotocol.Hello: // get named by parent
...
case gotocol.NameDrop: // someone new to talk to
...
case gotocol.Put: // upstream request handler
...
outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(),
msg.Ctx.NewParent(), msg.Intention}
flow.AnnotateSend(outmsg, name)
outmsg.GoSend(replicas)
}
case <-eurekaTicker.C: // poll the service registry
...
}
}
}
Skeleton code for replicating a Put message
Instrument incoming requests
Instrument outgoing requests
update trace context
Flow Trace Records
riak2
us-east-1
zoneC
riak9
us-west-2
zoneA
Put s896
Replicate
riak3
us-east-1
zoneA
riak8
us-west-2
zoneC
riak4
us-east-1
zoneB
riak10
us-west-2
zoneB
us-east-1.zoneC.riak2 t98p895s896 Put
us-east-1.zoneA.riak3 t98p896s908 Replicate
us-east-1.zoneB.riak4 t98p896s909 Replicate
us-west-2.zoneA.riak9 t98p896s910 Replicate
us-west-2.zoneB.riak10 t98p910s912 Replicate
us-west-2.zoneC.riak8 t98p910s913 Replicate
staash
us-east-1
zoneC
s910 s908s913
s909s912
Replicate Put
Open Zipkin
A common format for trace annotations
A Java tool for visualizing traces
Standardization effort to fold in other formats
Driven by Adrian Cole (currently at Pivotal)
Extended to load Spigo generated trace files
Trace for one Spigo Flow
Definition of an architecture
{
"arch": "lamp",
"description":"Simple LAMP stack",
"version": "arch-0.0",
"victim": "webserver",
"services": [
{ "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] },
{ "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] },
{ "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] },
{ "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] },
{ "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] }
]
}
Header includes
chaos monkey victim
New tier
name
Tier
package
0 = non
Regional
Node
count
List of tier
dependencies
See for yourself: http://simianviz.surge.sh/lamp
Running Spigo
$ ./spigo -a lamp -j -d 2
2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json
2016/01/26 23:04:05 lamp.edda: starting
2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack
2016/01/26 23:04:05 architecture: scaling to 100%
2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting
2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []}
2016/01/26 23:04:05 Starting: {memcache store 1 1 []}
2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]}
2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]}
2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]}
2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms
2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith
2016/01/26 23:04:07 asgard: Shutdown
2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing
2016/01/26 23:04:07 spigo: complete
2016/01/26 23:04:07 lamp.edda: closing
-a architecture lamp
-j graph json/lamp.json
-d run for 2 seconds
Riak IoT Architecture
{
"arch": "riak",
"description":"Riak IoT ingestion example for the RICON 2015 presentation",
"version": "arch-0.0",
"victim": "",
"services": [
{ "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]},
{ "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]},
{ "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]},
{ "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]},
{ "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]},
{ "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]},
{ "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]},
{ "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]},
{ "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]},
{ "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]},
{ "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]},
{ "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]},
{ "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]},
{ "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]},
{ "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]}
]
}
New tier
name
Tier
package
Node
count
List of tier
dependencies
0 = non
Regional
Single Region Riak IoT
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Load Balancer
Load Balancer
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service Riak TS
Analytics Service
Ingester Service
See for yourself: http://simianviz.surge.sh/riak
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
See for yourself: http://simianviz.surge.sh/riak
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
What’s the response
time of the stream
endpoint?
See for yourself: http://simianviz.surge.sh/riak
Response Times
What’s the response time distribution of a very
simple storage backed web service?
memcached
mysql
disk volume
web
service
load
generator
memcached
See http://www.getguesstimate.com/models/1307
memcached hit %
memcached response mysql response
service cpu time
memcached hit mode
mysql cache hit mode
mysql disk access mode
Hit rates: memcached 40% mysql 70%
Hit rates: memcached 60% mysql 70%
Hit rates: memcached 20% mysql 90%
Measuring
Response Time With
Histograms
Changes made to codahale/hdrhistogram
Changes made to go-kit/kit/metrics
Implementation in adrianco/spigo/collect
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Server
Time
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Response
CR-CS
Service
SS-SR
Network
SR-CS
Network
CR-SS
Net Round Trip
(SR-CS) + (CR-SS)
(CR-CS) - (SS-SR)
Server
Time
Spigo Histogram Results
Collected with: % spigo -d 60 -j -a storage -c
name: storage.*.*..load00...load.denominator_serv
quantiles: [{50 47103} {99 139263}]
From To Count Prob Bar
20480 21503 2 0.0007 :
21504 22527 2 0.0007 |
23552 24575 1 0.0003 :
24576 25599 5 0.0017 |
25600 26623 5 0.0017 |
26624 27647 1 0.0003 |
27648 28671 3 0.0010 |
28672 29695 5 0.0017 |
29696 30719 127 0.0421 |####
30720 31743 126 0.0418 |####
31744 32767 74 0.0246 |##
32768 34815 281 0.0932 |#########
34816 36863 201 0.0667 |######
36864 38911 156 0.0518 |#####
38912 40959 185 0.0614 |######
40960 43007 147 0.0488 |####
43008 45055 161 0.0534 |#####
45056 47103 125 0.0415 |####
47104 49151 135 0.0448 |####
49152 51199 99 0.0328 |###
51200 53247 82 0.0272 |##
53248 55295 77 0.0255 |##
55296 57343 66 0.0219 |##
57344 59391 54 0.0179 |#
59392 61439 37 0.0123 |#
61440 63487 45 0.0149 |#
63488 65535 33 0.0109 |#
65536 69631 63 0.0209 |##
69632 73727 98 0.0325 |###
73728 77823 92 0.0305 |###
77824 81919 112 0.0372 |###
81920 86015 88 0.0292 |##
86016 90111 55 0.0182 |#
90112 94207 38 0.0126 |#
94208 98303 51 0.0169 |#
98304 102399 32 0.0106 |#
102400 106495 35 0.0116 |#
106496 110591 17 0.0056 |
110592 114687 19 0.0063 |
114688 118783 18 0.0060 |
118784 122879 6 0.0020 |
122880 126975 8 0.0027 |
Normalized probability
Response time distribution
measured in nanoseconds
using High Dynamic
Range Histogram
:# Zero counts skipped
|# Contiguous buckets
Median and 99th
percentile values
service time for
load generator
Cache hit Cache miss
Go-Kit Histogram Example
const (
maxHistObservable = 1000000
sampleCount = 500
)
func NewHist(name string) metrics.Histogram {
var h metrics.Histogram
if name != "" && archaius.Conf.Collect {
h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...)
if sampleMap == nil {
sampleMap = make(map[metrics.Histogram][]int64)
}
sampleMap[h] = make([]int64, 0, sampleCount)
return h
}
return nil
}
func Measure(h metrics.Histogram, d time.Duration) {
if h != nil && archaius.Conf.Collect {
if d > maxHistObservable {
h.Observe(int64(maxHistObservable))
} else {
h.Observe(int64(d))
}
s := sampleMap[h]
if s != nil && len(s) < sampleCount {
sampleMap[h] = append(s, int64(d))
}
}
}
Nanoseconds!
Median and 99%ile
Slice for first 500
values as samples for
export to Guesstimate
Golang Guesstimate Interface
https://github.com/adrianco/goguesstimate
{
"space": {
"name": "gotest",
"description": "Testing",
"is_private": "true",
"graph": {
"metrics": [
{"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}},
{"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column":
3}},
{"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}},
{"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}}
],
"guesstimates": [
{"metric": "AB", "input": null, "guesstimateType": "DATA", "data":
[119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706,
5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9
991]},
{"metric": "AC", "input": "40", "guesstimateType": "POINT"},
{"metric": "AD", "input": "[1000,4000]", "guesstimateType": "LOGNORMAL"},
{"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"}
]
}
}
}
See http://www.getguesstimate.com
% cd json_metrics; sh guesstimate.sh storage
@adrianco
Simplicity through symmetry
Symmetry
Invariants
Stable assertions
No special cases
@adrianco
“We see the world as increasingly more complex and chaotic
because we use inadequate concepts to explain it. When we
understand something, we no longer see it as chaotic or complex.”
Jamshid Gharajedaghi - 2011
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
Learn More…
Q&A
Adrian Cockcroft @adrianco
http://slideshare.com/adriancockcroft
Technology Fellow - Battery Ventures
See www.battery.com for a list of portfolio investments
Security
Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested.
Palo Alto Networks
Enterprise IT
Operations &
Management
Big DataCompute
Networking
Storage

More Related Content

What's hot

Delivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards SophisticationDelivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards SophisticationThoughtworks
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InAdrian Cockcroft
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyAdrian Cockcroft
 
GameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos EngineeringGameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos EngineeringDiUS
 
From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!Jules Pierre-Louis
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators DevelopAdrian Cockcroft
 
It’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven MicroservicesIt’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven Microservicescornelia davis
 
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer EscalationsPagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer EscalationsRundeck
 
Cloud Native Architectures for Devops
Cloud Native Architectures for DevopsCloud Native Architectures for Devops
Cloud Native Architectures for Devopscornelia davis
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Adrian Cockcroft
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitRandy Bias
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Amazon Web Services
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service ArchitectureEduards Sizovs
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services
 
Cloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCCloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCAdrian Cockcroft
 

What's hot (20)

Epidemic Failures
Epidemic FailuresEpidemic Failures
Epidemic Failures
 
Delivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards SophisticationDelivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards Sophistication
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock In
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the Ugly
 
GameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos EngineeringGameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos Engineering
 
From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators Develop
 
It’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven MicroservicesIt’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven Microservices
 
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer EscalationsPagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Cloud Native Architectures for Devops
Cloud Native Architectures for DevopsCloud Native Architectures for Devops
Cloud Native Architectures for Devops
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Cloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCCloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCC
 

Viewers also liked

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
DevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for KubernetesDevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for KubernetesAmbassador Labs
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Animesh Singh
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...Ambassador Labs
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Adrian Cockcroft
 
Principles of microservices velocity
Principles of microservices   velocityPrinciples of microservices   velocity
Principles of microservices velocitySam Newman
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 

Viewers also liked (7)

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
DevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for KubernetesDevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for Kubernetes
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Principles of microservices velocity
Principles of microservices   velocityPrinciples of microservices   velocity
Principles of microservices velocity
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 

Similar to Microservices: What's Missing - O'Reilly Software Architecture New York

What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoAdrian Cockcroft
 
Architecting Microservices in .Net
Architecting Microservices in .NetArchitecting Microservices in .Net
Architecting Microservices in .NetRichard Banks
 
Microservices: State of the Union
Microservices: State of the UnionMicroservices: State of the Union
Microservices: State of the UnionC4Media
 
Metrics driven dev ops 2017
Metrics driven dev ops 2017Metrics driven dev ops 2017
Metrics driven dev ops 2017Jerry Tan
 
Top Performance Problems in Distributed Architectures
Top Performance Problems in Distributed ArchitecturesTop Performance Problems in Distributed Architectures
Top Performance Problems in Distributed ArchitecturesAndreas Grabner
 
Supercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the EdgeSupercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the EdgeOptimizely
 
Managing microservices with Istio Service Mesh
Managing microservices with Istio Service MeshManaging microservices with Istio Service Mesh
Managing microservices with Istio Service MeshRafik HARABI
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationAndrew Wesbecher
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Min Fang
 
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...Deborah Schalm
 
"Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra..."Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra...DevOps.com
 
Apply best parts of microservices to serverless
Apply best parts of microservices to serverlessApply best parts of microservices to serverless
Apply best parts of microservices to serverlessYan Cui
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Andreas Grabner
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback LoopsDevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback LoopsAndreas Grabner
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Mike Villiger
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Idit Levine
 
Reactive Application Using METEOR
Reactive Application Using METEORReactive Application Using METEOR
Reactive Application Using METEORNodeXperts
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 

Similar to Microservices: What's Missing - O'Reilly Software Architecture New York (20)

What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
Architecting Microservices in .Net
Architecting Microservices in .NetArchitecting Microservices in .Net
Architecting Microservices in .Net
 
Microservices: State of the Union
Microservices: State of the UnionMicroservices: State of the Union
Microservices: State of the Union
 
Metrics driven dev ops 2017
Metrics driven dev ops 2017Metrics driven dev ops 2017
Metrics driven dev ops 2017
 
Top Performance Problems in Distributed Architectures
Top Performance Problems in Distributed ArchitecturesTop Performance Problems in Distributed Architectures
Top Performance Problems in Distributed Architectures
 
Supercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the EdgeSupercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the Edge
 
Managing microservices with Istio Service Mesh
Managing microservices with Istio Service MeshManaging microservices with Istio Service Mesh
Managing microservices with Istio Service Mesh
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
 
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
 
"Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra..."Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra...
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Apply best parts of microservices to serverless
Apply best parts of microservices to serverlessApply best parts of microservices to serverless
Apply best parts of microservices to serverless
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback LoopsDevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
 
Reactive Application Using METEOR
Reactive Application Using METEORReactive Application Using METEOR
Reactive Application Using METEOR
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 

More from Adrian Cockcroft

Gophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesGophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesAdrian Cockcroft
 
Microservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceMicroservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceAdrian Cockcroft
 
Microxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesMicroxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesAdrian Cockcroft
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and ArchitectureAdrian Cockcroft
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureAdrian Cockcroft
 
Dockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferDockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferAdrian Cockcroft
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeAdrian Cockcroft
 
Dockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesAdrian Cockcroft
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost OptimizationAdrian Cockcroft
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Adrian Cockcroft
 
Disrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation ConferenceDisrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation ConferenceAdrian Cockcroft
 
Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Adrian Cockcroft
 

More from Adrian Cockcroft (13)

Gophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesGophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential Goroutines
 
Microservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceMicroservices Workshop - Craft Conference
Microservices Workshop - Craft Conference
 
In Search of Segmentation
In Search of SegmentationIn Search of Segmentation
In Search of Segmentation
 
Microxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesMicroxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for Microservices
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and Architecture
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 Structure
 
Dockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferDockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper Safer
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Dockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in Microservices
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
 
Disrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation ConferenceDisrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
 
Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 

Recently uploaded (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 

Microservices: What's Missing - O'Reilly Software Architecture New York

  • 1. Microservices: What’s Missing? Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures April 2016
  • 2. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
  • 3. http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/twitter-soa http://www.infoq.com/presentations/Zipkin http://www.infoq.com/presentations/scale-gilt Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk http://www.infoq.com/presentations/circuit-breaking-distributed-systems https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014 State of the Art in Web Scale Microservice Architectures AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0 Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU http://www.infoq.com/presentations/spring-cloud-2015
  • 4. Microservice Architectures ConfigurationTooling Discovery Routing Observability Development: Languages and Container Operational: Orchestration and Deployment Infrastructure Datastores Policy: Architectural and Security Compliance
  • 5. Microservices Edda Archaius Configuration Spinnaker SpringCloud Tooling Eureka Prana Discovery Denominator Zuul Ribbon Routing Hystrix Pytheus Atlas Observability Development using Java, Groovy, Scala, Clojure, Python with AMI and Docker Containers Orchestration with Autoscalers on AWS, Titus exploring Mesos & ECS for Docker Ephemeral datastores using Dynomite, Memcached, Astyanax, Staash, Priam, Cassandra Policy via the Simian Army - Chaos Monkey, Chaos Gorilla, Conformity Monkey, Security Monkey
  • 6. @adrianco Discussion Points Failure injection testing Versioning, routing Protocols and interfaces Timeouts and retries Denormalized data models Monitoring, tracing Simplicity through symmetry
  • 7. @adrianco Failure Injection Testing Netflix Chaos Monkey and Simian Army http://techblog.netflix.com/2011/07/netflix-simian-army.html http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html http://techblog.netflix.com/2016/01/automated-failure-testing.html
  • 8. ! Chaos Monkey - enforcing stateless business logic ! Chaos Gorilla - enforcing zone isolation/replication ! Chaos Kong - enforcing region isolation/replication ! Security Monkey - watching for insecure configuration settings ! Latency Monkey & FIT - inject errors to enforce robust dependencies ! See over 100 NetflixOSS projects at netflix.github.com ! Get “Technical Indigestion” reading techblog.netflix.com Trust with Verification
  • 9. @adrianco Benefits of version aware routing Immediately and safely introduce a new version Canary test in production Use feature flags n Route clients to a version so they can’t get disrupted Change client or dependencies but not both at once Eventually remove old versions Incremental or infrequent “break the build” garbage collection
  • 10. @adrianco Versioning, Routing Version numbering: Interface.Feature.Bugfix V1.2.3 to V1.2.4 - Canary test then remove old version V1.2.x to V1.3.x - Canary test then remove or keep both Route V1.3.x clients to new version to get new feature Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients V1.x.x to V2.x.x - Route clients to specific versions Remove old server version when all old clients are gone
  • 11. @adrianco Protocols Measure serialization, transmission, deserialization costs “Sending a megabyte of XML between microservices will make you sad…” Use Thrift, Protobuf/gRPC, Avro, SBE internally Use JSON for external/public interfaces https://github.com/real-logic/simple-binary-encoding
  • 12. @adrianco Interfaces When you build a service, build a “driver” client for it Reference implementation error handling and serialization Release automation stress test using client Validate that service interface is usable! Minimize additional dependencies Swagger - OpenAPI Specification Datawire Quark adds behaviors to API spec
  • 23. @adrianco Interface Version Pinning Change one thing at a time! Pin the version of everything else Incremental build/test/deploy pipeline Deploy existing app code with new platform Deploy existing app code with new dependencies Deploy new app code with pinned platform/dependencies
  • 24. @adrianco Timeouts and Retries Connection timeout vs. request timeout confusion Usually setup incorrectly, global defaults Systems collapse with “retry storms” Timeouts too long, too many retries Services doing work that can never be used
  • 25. @adrianco Connections and Requests TCP makes a connection, HTTP makes a request HTTP hopefully reuses connections for several requests Both have different timeout and retry needs! TCP timeout is purely a property of one network latency hop HTTP timeout depends on the service and its dependencies connection path request path
  • 26. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 27. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 28. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 29. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service
  • 30. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  • 31. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  • 32. @adrianco Timeout and Retry Fixes Cascading timeout budget Static settings that decrease from the edge or dynamic budget passed with request How often do retries actually succeed? Don’t ask the same instance the same thing Only retry on a different connection
  • 34. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service 3s 1s 1s Fast fail response after 2s Upstream timeout must always be longer than total downstream timeout * retries delay No unproductive work while fast failing
  • 35. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service
  • 36. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service 3s 1s For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service Successful response delayed 1s
  • 37. @adrianco Manage Inconsistency ACM Paper: "The Network is Reliable" Distributed systems are inconsistent by nature Clients are inconsistent with servers Most caches are inconsistent Versions are inconsistent Get over it and Deal with it
  • 38. @adrianco Denormalized Data Models Any non-trivial organization has many databases Cross references exist, inconsistencies exist Microservices work best with individual simple stores Scale, operate, mutate, fail them independently NoSQL allows flexible schema/object versions
  • 39. @adrianco Denormalized Data Models Build custom cross-datasource check/repair processes Ensure all cross references are up to date Immutability Changes Everything http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html Memories, Guesses and Apologies https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/
  • 41. Cloud Native Microservices ! High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual ! Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running ! Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
  • 42. Low Latency SaaS Based Monitors https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com See www.battery.com for a list of portfolio investments
  • 44. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  • 45. Flow
  • 46. Some tools can show the request flow across a few services
  • 47. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  • 48. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools Code: github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 See for yourself: http://simianviz.surge.sh Follow @simianviz for updates ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones Denominator DNS Endpoint
  • 49. Spigo Nanoservice Structure func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, hist) switch msg.Imposition { case gotocol.Hello: // get named by parent ... case gotocol.NameDrop: // someone new to talk to ... case gotocol.Put: // upstream request handler ... outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(), msg.Ctx.NewParent(), msg.Intention} flow.AnnotateSend(outmsg, name) outmsg.GoSend(replicas) } case <-eurekaTicker.C: // poll the service registry ... } } } Skeleton code for replicating a Put message Instrument incoming requests Instrument outgoing requests update trace context
  • 50. Flow Trace Records riak2 us-east-1 zoneC riak9 us-west-2 zoneA Put s896 Replicate riak3 us-east-1 zoneA riak8 us-west-2 zoneC riak4 us-east-1 zoneB riak10 us-west-2 zoneB us-east-1.zoneC.riak2 t98p895s896 Put us-east-1.zoneA.riak3 t98p896s908 Replicate us-east-1.zoneB.riak4 t98p896s909 Replicate us-west-2.zoneA.riak9 t98p896s910 Replicate us-west-2.zoneB.riak10 t98p910s912 Replicate us-west-2.zoneC.riak8 t98p910s913 Replicate staash us-east-1 zoneC s910 s908s913 s909s912 Replicate Put
  • 51. Open Zipkin A common format for trace annotations A Java tool for visualizing traces Standardization effort to fold in other formats Driven by Adrian Cole (currently at Pivotal) Extended to load Spigo generated trace files
  • 52. Trace for one Spigo Flow
  • 53. Definition of an architecture { "arch": "lamp", "description":"Simple LAMP stack", "version": "arch-0.0", "victim": "webserver", "services": [ { "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] }, { "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] }, { "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] }, { "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] }, { "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] } ] } Header includes chaos monkey victim New tier name Tier package 0 = non Regional Node count List of tier dependencies See for yourself: http://simianviz.surge.sh/lamp
  • 54. Running Spigo $ ./spigo -a lamp -j -d 2 2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json 2016/01/26 23:04:05 lamp.edda: starting 2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack 2016/01/26 23:04:05 architecture: scaling to 100% 2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting 2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []} 2016/01/26 23:04:05 Starting: {memcache store 1 1 []} 2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]} 2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]} 2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]} 2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms 2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith 2016/01/26 23:04:07 asgard: Shutdown 2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing 2016/01/26 23:04:07 spigo: complete 2016/01/26 23:04:07 lamp.edda: closing -a architecture lamp -j graph json/lamp.json -d run for 2 seconds
  • 55. Riak IoT Architecture { "arch": "riak", "description":"Riak IoT ingestion example for the RICON 2015 presentation", "version": "arch-0.0", "victim": "", "services": [ { "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]}, { "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]}, { "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]}, { "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]}, { "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]}, { "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]}, { "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]}, { "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]}, { "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]}, { "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]}, { "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]}, { "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]}, { "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]}, { "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]}, { "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]} ] } New tier name Tier package Node count List of tier dependencies 0 = non Regional
  • 56. Single Region Riak IoT See for yourself: http://simianviz.surge.sh/riak
  • 57. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint See for yourself: http://simianviz.surge.sh/riak
  • 58. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Load Balancer Load Balancer See for yourself: http://simianviz.surge.sh/riak
  • 59. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 60. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 61. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 62. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Riak TS Analytics Service Ingester Service See for yourself: http://simianviz.surge.sh/riak
  • 63. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics See for yourself: http://simianviz.surge.sh/riak
  • 64. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics What’s the response time of the stream endpoint? See for yourself: http://simianviz.surge.sh/riak
  • 66. What’s the response time distribution of a very simple storage backed web service? memcached mysql disk volume web service load generator memcached
  • 68. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 40% mysql 70%
  • 69. Hit rates: memcached 60% mysql 70%
  • 70. Hit rates: memcached 20% mysql 90%
  • 72. Changes made to codahale/hdrhistogram Changes made to go-kit/kit/metrics Implementation in adrianco/spigo/collect
  • 73. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Server Time
  • 74. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Response CR-CS Service SS-SR Network SR-CS Network CR-SS Net Round Trip (SR-CS) + (CR-SS) (CR-CS) - (SS-SR) Server Time
  • 75. Spigo Histogram Results Collected with: % spigo -d 60 -j -a storage -c name: storage.*.*..load00...load.denominator_serv quantiles: [{50 47103} {99 139263}] From To Count Prob Bar 20480 21503 2 0.0007 : 21504 22527 2 0.0007 | 23552 24575 1 0.0003 : 24576 25599 5 0.0017 | 25600 26623 5 0.0017 | 26624 27647 1 0.0003 | 27648 28671 3 0.0010 | 28672 29695 5 0.0017 | 29696 30719 127 0.0421 |#### 30720 31743 126 0.0418 |#### 31744 32767 74 0.0246 |## 32768 34815 281 0.0932 |######### 34816 36863 201 0.0667 |###### 36864 38911 156 0.0518 |##### 38912 40959 185 0.0614 |###### 40960 43007 147 0.0488 |#### 43008 45055 161 0.0534 |##### 45056 47103 125 0.0415 |#### 47104 49151 135 0.0448 |#### 49152 51199 99 0.0328 |### 51200 53247 82 0.0272 |## 53248 55295 77 0.0255 |## 55296 57343 66 0.0219 |## 57344 59391 54 0.0179 |# 59392 61439 37 0.0123 |# 61440 63487 45 0.0149 |# 63488 65535 33 0.0109 |# 65536 69631 63 0.0209 |## 69632 73727 98 0.0325 |### 73728 77823 92 0.0305 |### 77824 81919 112 0.0372 |### 81920 86015 88 0.0292 |## 86016 90111 55 0.0182 |# 90112 94207 38 0.0126 |# 94208 98303 51 0.0169 |# 98304 102399 32 0.0106 |# 102400 106495 35 0.0116 |# 106496 110591 17 0.0056 | 110592 114687 19 0.0063 | 114688 118783 18 0.0060 | 118784 122879 6 0.0020 | 122880 126975 8 0.0027 | Normalized probability Response time distribution measured in nanoseconds using High Dynamic Range Histogram :# Zero counts skipped |# Contiguous buckets Median and 99th percentile values service time for load generator Cache hit Cache miss
  • 76. Go-Kit Histogram Example const ( maxHistObservable = 1000000 sampleCount = 500 ) func NewHist(name string) metrics.Histogram { var h metrics.Histogram if name != "" && archaius.Conf.Collect { h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...) if sampleMap == nil { sampleMap = make(map[metrics.Histogram][]int64) } sampleMap[h] = make([]int64, 0, sampleCount) return h } return nil } func Measure(h metrics.Histogram, d time.Duration) { if h != nil && archaius.Conf.Collect { if d > maxHistObservable { h.Observe(int64(maxHistObservable)) } else { h.Observe(int64(d)) } s := sampleMap[h] if s != nil && len(s) < sampleCount { sampleMap[h] = append(s, int64(d)) } } } Nanoseconds! Median and 99%ile Slice for first 500 values as samples for export to Guesstimate
  • 77. Golang Guesstimate Interface https://github.com/adrianco/goguesstimate { "space": { "name": "gotest", "description": "Testing", "is_private": "true", "graph": { "metrics": [ {"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}}, {"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column": 3}}, {"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}}, {"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}} ], "guesstimates": [ {"metric": "AB", "input": null, "guesstimateType": "DATA", "data": [119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706, 5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9 991]}, {"metric": "AC", "input": "40", "guesstimateType": "POINT"}, {"metric": "AD", "input": "[1000,4000]", "guesstimateType": "LOGNORMAL"}, {"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"} ] } } }
  • 78. See http://www.getguesstimate.com % cd json_metrics; sh guesstimate.sh storage
  • 80. @adrianco “We see the world as increasingly more complex and chaotic because we use inadequate concepts to explain it. When we understand something, we no longer see it as chaotic or complex.” Jamshid Gharajedaghi - 2011 Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
  • 82. Q&A Adrian Cockcroft @adrianco http://slideshare.com/adriancockcroft Technology Fellow - Battery Ventures See www.battery.com for a list of portfolio investments
  • 83. Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage